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(54) ROBOT DEVICE, ROBOT DEVICE ACTION CONTROL METHOD, EXTERNAL FORCE 
DETECTING DEVICE AND EXTERNAL FORCE DETECTING METHOD 



(57) A robot (1 ) is proposed which includes a speech 
recognition unit (1 01) to detect information supplied si- 
multaneously with or just before or after detection of a 
touch by a touch sensor, an associative memory/recall 
memory (104) to store action made correspondingly to 
the touch and input information (speech signal) detected 
by the speech recognition unit (101) in association with 
each other, and an action generator (105) to control the 



robot (1) to make action recalled by the associative 
memory/recall memory (104) based on a newly ac- 
quired input information (speech signal). The robot (1) 
includes also a sensor data processor (1 02) to allow the 
robot (1) to act correspondingly to the touch detection 
by the touch sensor. Thus, the robot (1 ) can learn action 
in association with an input signal such as speech sig- 
nal. 
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Description 
Technical neld 

5 [0001] TTie present invention generally relates to a robot apparatus, method for controlling the action of the robot 
apparatus, and an external-force detecting apparatus and method. 

Background Art 

10 [0002] Conventionally, the knowledge acquisition or language acquisition are based mainly on the associative mem- 
ory of visual information and audio information. 

[0003] The "Learning Words from Natural Audio-Visual Input" (by Deb Roy and Alex Pentland) (will be referred to 
as "Document 1 " hereinunder) discloses the study of language teaming from input speech and input image. The learning 
method in the Document 1 Is as will be outlined below. 
15 [0004] Image signal and speech signal (acoustic signal) are supplied to a learning system simultaneously with each 
other or at different times. In the Document 1 , the event of image and speech in such a pair supplied simultaneously 
with each other or at different times is called "AV event", 

[0005] When the image signal and speech signal are thus supplied, an image processing is made to detect a color 
and shape from the image signal by an image processing, while a speech processing is made to detect a recurrent 

20 neural network from the speech signal and make a phonemic analysis of the speech signal. More particularly, the input 
image is classified to each class (class for recognition of a specific image or image recognition class) based on a 
feature in the image feature space, while the input speech is classified to each class (class for recognition of a specific 
sound or sound recognition class) based on a feature in the sound feature space. The feature space is composed of 
a plurality of elements as shown in FIG. 1 . For example, for the image signal, the feature space is composed of a two- 

25 dimensional or multi-dimensional space of which the elements are color-difference signal and brightness signal. Since 
the input image has a predetemnined mapping of elements thereof in such a feature space, color can be recognized 
based on the element mapping. In the feature space, the classification is made in view of a distance to recognize a color. 
[0006] For recognition of a sound for example, the continuous recognition HMM (hidden Markov model) method is 
employed. The continuous recognition HMM method (will be referred to simply as "HMM" hereunder) pennlts a speech 

30 signal to be recognized as a phoneme sequences. Also, the above recurrent neural network is a one through which a 
signal feed back to the input layer side. 

[0007] Based on a correlation concerning a concurrence (correlative learning), a classified phoneme is con^elated 
with a stimulus (image) classified by the image processing for the purpose of learning. That is, a name and description 
of a thing indicated as an image are acquired as a result of the learning. 

35 [0008] As shown in FIG. 2, in the above teaming, an input image is identified (recognized) according to image classes 
including "red thing", "blue thing", ... each fomned from image infomnation, while an input speech is identified (recog- 
nized) according to classes including uttered "red", "blue", "yellow", ... formed from sound information. 
[0009] Then the image and speech classified as in the above are corelated with each other by the correlative learning, 
whereby when "a red thing" is supplied as an input image, a learning system 200 in FIG. 2 can output an phoneme 

40 sequences of "red" (uttered) as a result of the correlative learning. 

[0010] Recently, there has been proposed a robot apparatus which can autonomously behave in response to a 
surrounding environment (external factor) and internal state (internal factor such as state of an emotion or instinct). 
Such a robot apparatus (will be referred to as "robot" hereunder) is designed to interact with the human being or 
environment. For example, there have been proposed so-called pet robots and the like each having a shape like an 

45 animal and behaving like the animal. 

[0011] For example, capability of having such a robot team various kinds of infonnation will lead to an improvement 
of its amusement. Especially the capability of learning action or behavior will enhance the fun to play with the robot. 
[0012] The application of the aforementioned teaming method (as in the Document 1) to a robot designed to be 
controllable to act encounters the following problems. 

50 [0013] First, the above learning method Is not appropriately set to control the robot to act. 

[0014] As disclosed in the Document 1, utterance will create and output an appropriate phoneme sequences if a 
stored word is created in response to an input signal or the input signal is judged to be a new signal. However, the 
robot is not required to utter an input signal as it Is for the interaction with the human being or environment but it is 
required to act appropriately in response to an input. 

55 [0015] Also, when classified based on a distance in the image feature space and sound feature space, acquired 
image and speech will be infonnation near to each other in the image and sound feature spaces. However, the robot 
is required to act differently in response to the image ahd speech in some cases. In such a case, the classification has 
to be done for appropriate action. However, the conventional methods cannot accommodate such requirements. 
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[0016] The conventional knowledge or language acquisition system includes mainly the following: 

(1) Means for classifying image signal and generating new classes 

(2) Means for classifying acoustic signal and generation new classes 

5 (3) Means for correlating results from items (1 ) and (2) with each other or learning image and sound in association 

with each other 

[0017] Of course, some of the conventional knowledge or language acquisition systems use other than the above 
functions. But the above three functions are essential ones for such systems. 

10 [0018] The classifications as in the above items (1) and (2) including mapping In a feature space, parametric dis- 
crimination of significant signal with a foreseeing knowledge, use of a probabilistic classification, etc. 
[0019] Generally, an image can be recognized for example by controlling a threshold of a color template for each of 
colors such as red, blue, green and yellow in the color space or by determining, for a presented color stimulus, a 
probability of each color based on a distance between an existing color storage area and input color in the feature 

15 space. For example, for an area already classified as a feature in a feature space as shown in FIG. 1 , a probability of 
the classification is detemnined from a distance of an area defined by a feature of an input image from the existing 
feature area. Also, a method by a neural net is effectively usable for this purpose. 

[0020] On the other hand, for learning a speech, a phoneme sequences supplied by the HMM through a phoneme 
detection and a stored phoneme sequences are compared with each other and a word is probabilistically recognized 

20 based on a result of the comparison. 

[0021] The means for generating new classes as in the above items (1) and (2) include the following: 
[0022] An input signal is evaluated to detemnine whether it belongs to an existing class. When the input signal Is 
detennined to belong to the existing class, it is made to belong to that class and fed back to the classification method. 
On the other hand, if the input signal is judged not to belong.to any class, a new class Is generated and a learning is 

25 made for the classification to be done based on an input stimulus. 

[0023] A new class is generated as follows. For example, if an Image class is judged not to belong to any existing 
classes (class of Image A, class of image B, ...), the existing class (e.g., class of image A) is divided to generate a new 
Image class as shown in FIG. 3A. If a soUnd class is judged not to belong to any existing classes (class of sound a, 
class of sound ...), the existing class (e.g., class of sound is divided to generate a new sound clas^ as shown in 

30 FIG. SB. 

[0024] Also, the association of an image and sound as in the item (3) includes an associative memory or the like. 
[0025] A discrimination class for an image is called a vector (will be referred to as "image discrimination vector" 
hereunder) IC [i](i = 0, 1, .... NlC-1) and a discrimination class for a sound is called a vector (will be refen-ed to as 
"sound discrimination vector hereunder) SCO](j = 0, 1, NSC-1). For an image signal and sound signal presented 
35 (supplied for leaming), a probability or result of evaluation of each recognition class are set to vector values, respec- 
tively 

[0026] In a self-recalling associative memory, an image recognition vector 10 and sound recognition vector SO are 
made a single vector given by the following equations (1) and (2): ^ 

CV[n] = lC[n] (0 ^ n < NIC) (1) 



CV[n] = SC[n-NICl(0<n<NSC) (2) 

45 

[0027] Note that In the field of the self-recalling associative memory, the so-called Hopfield net proposed by Hopf ield 
is well known. 

[0028] The above vectors are made a single vector as will be described below. On the assumption that the vector 
CV is a column vector, the self-recalling associative memory is made by adding a matrix delta_W as given by the 
50 following equation (3) to a currently stored matrix W: 

delta.W = CV X trans(CV) . (3) 

55 [0029] Thus, an image stimulus (input image) can be regarded as a class and a word as a result of speech recognition 
(e.g., class of HMM) can be associated with the class. By presenting a new image (e.g., red thing ) and entering an 
speech "red" each of the image and sound classes is depicted in red of the image stimulus to have an appropriate size 
for a stimulus or distance in the feature space, and similarly, each class reacts to an appropriate extent for the phoneme 
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sequences of the speech "red". These classes are handled as a correlative nnatrix in the above equations and sto- 
chastically averaged so that the image and speech classes have high values with respect to the same stimulus, namely, 
they have a high correlation between them. Thus, when a red image is presented, an HMM class "red (uttered)" is 
stored In association with the red image. 
5 [0030] On the other hand, the "Perceptually Grounded Meaning Creation" (by Luc Steels, ICMAS, Kyoto, 1 996) (will 
be referred to as "Document 2" hereunder) discloses a meaning acquisition by an experiment called "discrimination 
game". The discrimination game is as will be outlined below. 

[0031] The "discrimination game" system includes a plurality of sensor channels and feature detectors not limited 
for image and sound as in the above: A thing called "agent" (e.g., a software) tries, by means of the feature detectors, 

10 to differentiate between a newly presented object and another object (already recognized one), namely, it makes a 
differentiation between the objects based on a feature. If there exists no feature with which a differentiation between 
objects can be done, a new feature detector is created which corresponds to the newly presented object If an object 
has not a feature with which a differentiation can be done from another object, namely, when a corresponding feature 
detector is not available for an object, the agent is judged to have won the discrimination game. If an object has a 

IS corresponding feature detector, the agent is judged to be a winner of the game. 

[0032] Then, the entire system works based on the principle of "selectionist". That is, an object having won the game 
has a higher probability of survival, while an object having lost the game will create a new feature detector. However, 
the new feature detector will be used in a next game and it is not known whether the detector will provide a correct 
result. Thus, an agent capable of a better differentiation will sun^ive. 

^ [0033] The discrimination game has been outlined in the above. In other words, such a discrimination game may be 
regarded as a method for creating a better feature detector through a natural selection. 

[0034] Also, 'The Spontaneous Self-Organization of An Adaptive Language" (by Luc Steels, Muggleton, S. (ed.), 
1996, Machine, Intelligence 15.) (will be referred to as "Document 3" hereunder) reads an language generation by a 
"language game" method. The "language game" includes the following three steps: 

25 

First step: Propagation 

Second step: Creation. In this step, an agent creases a new word and associates it with a new feature. 
Third step: Self-organization. In this step, the system organizes itself through sorting and selection. 

30 [0035] More specifically, the language game Includes a first step for a so-called image processing, a second step 
for a word related with language processing (actually, however, no speech is recognized but so-called character is 
entered), and a third step in which an image acquired in the first step (step 1) Is associated with the word. The afore- 
mentioned discrimination game has no part equivalent to the second step but it is applied only to a differentiation 
effected in an existing feature space. 

35 [0036] Also, the " Language Acquisition with A Conceptual Structure- Based Speech Input from Perceptual Infonna- 
tion" (by IwasakI and Tamura, Sony Computer Science Laboratory) (will be referred to as "Document 4" hereunder) 
discloses an acquisition of a grammar using HMM forthe speech recognition and a typical pattem (in circular, triangular 
or other shapes, and red, blue and other colors) in which an image is displayed in colors on a computer monitor for 
the image recognition. 

40 [0037] In the Document 4, the user simultaneously clicks a pointing device or mouse (with a pointer 21 2 pointed) on 
a pattern (an object) on a monitor 210 as shown in FIG. 4, and utters "red circle" or the like. The discrimination game 
theory for color images and speech recognition for HMM are used to effect the first to third steps probabilistically in 
the language game in the Document 3. 

[0038] For generation of a new class, a predetemiined method for verification is effected. In the method disclosed 
-^5 In the Document 4, when it is judged that a new class should be generated by the verification using HMM forthe speech 
recognition, the HMM is subdivided to generate the new class. 

[0039] Further, a pattern 211 (first object (Obj 1) selected by pointing the cursor thereto and clicking the mouse is 
moved onto a second object (Obj2) 213 as indicated with an arrow In FIG. 4, and at the same time, an uttered speech 
"mounr is supplied to recognize a movement of a pattern, made on the monitor 21 0. The movement thus recognized 
50 is classified by HMM. 

[0040] As in the foregoing, a variety of techniques for knowledge or language acquisition has been proposed. How- 
ever, these techniques are not advantageous as in the following concerning the aspect of action acquisition (action 
learning) in the robot. 

55 (1) Evaluation of distance in feature space and belonglngness to a class, of Input signal 

(2) Creation and evaluation of action 

(3) Sharing of target object between robot and user. So-called target object sharing 
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[0041] The above problem (1) is difficult to solve since evaluation of the belongingness of an Input image signal to 
a class is influenced only by information related to the image signal, sound signal supplied at the same time or by 
stored infomnation recalled based on the two signal. Note that a belongingness-to-ciass evaluation is an Index for a 
' class to which of classes an input signal belongs. 

5 [0042] Assume here that there has been entered an image signal considered very near to Image signals in an existing 
class in the feature space. In this case, classes A and B are near to each other in the image feature space as shown 
in FIG. 5A. However, it is assumed that the image signal thus entered is intended to generate a new class. 
[0043] On the other hand, If there Is made under these conditions a judgment that a speech signal has been entered 
as other infomnation on an object corresponding to the Image signal and the input speech signal is very far from the 

10 existing classes, a new class for the speech signal will be generated for the object. So, it is assumed for example that 
as shown In FIG. 5B, a class of sound a (sound class con-esponding to the class of image A) and a class of sound p 
(sound class corresponding to the class of Image B) are mapped differently in the sound feature space and so a 
threshold S2 can be set. 

[0044] Therefore, if the belongingness-to-class evaluation of a sound; made based on the feature space, can reflect 
IS the belongingness-to-class evaluation of an Image, it is possible to generate a new class for the image. For example, 
by reflecting the belongingness-to-class evaluation in the sound feature space, there can be set a threshold between 
the classes of images A and B near to each other to differentiate between the classes as shown In FIG. 5A. That Is, 
by making reference to any other belongingness-to-class evaluation, belongingness to a class can be effected more 
appropriately. 

20 [0045] However, if classes of image signal or speech signal are very near to each other, the above is not sufficient 
to generate a new class for the inriage or speech. It means that when Image classes or sound classes are near to each 
other in their respective feature space as shown in FIGS. 6A and 6B, they cannot be differentiated between them even 
if they have quite different features from each other as viewed from a third feature space. The third feature space may 
be indicative of a f eatu re of action . 

25 [0046] Accordingly, the present Invention has an object to overcome the above-mentioned drawbacks of the prior 
art by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to appropri- 
ately differentiate objects in their respective feature spaces. 

[0047] The above problem (2) is to generate, when a signal to be judged to belong to a new class is supplied to the 
robot apparatus, new action of the robot apparatus and to evaluate the new action. 
30 [0048] With the conventional technique, evaluation of a language creation corresponds to evaluation of generated 

action. With the technique disclosed in the Document 3, an artDitrary phoneme sequences is generated. It will be a 
name or the like of an object contained in an input signal, maybe, an image signal. However, any art>itrary motion series 
should not be generated^to generate action. 

[0049] For example, even if there is generated an arbitrary series of each joint angle of the robot apparatus having 
35 four legs having a 3 degree of freedom for example, the robot apparatus will not make any meaningful motion. When 
a language is generated, the phoneme sequences of the language will only be a name of the object. However, it will 
be a problem how to evaluate generated action, good or not good. 

[0050] Also the present Invention has another object to overcome the above-mentioned drawbacks of the prior art 
by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to generate 

40 appropriate action for an Input. 

[0051] The above-mentioned problem (3) is the so-called target object sharing (shared attention). This problem is 
caused by the fact that information perceived by the robot apparatus is very variable. For example, even when the 
user or trainer tries to teach the robot apparatus by holding an orange ball in a direction not towards the image signal 
input unit (e.g., CCD camera) of the robot apparatus and uttering "orange bail", if an object within the field of view of 

45 the robot apparatus is a pink box, the "pink" box will be associated with the speech "orange ball". 

[0052] In the Document 4, the pattern 21 1 on the monitor 210 is designated as a target object by pointing the cursor 
to the pattern 211 and clicking the mouse. Actually, however, there is not available any means for pointing or designating 
such a target object. Even in case the theories disclosed in the Documents 2 and 3 are applied to the robot apparatus, 
the trainer or user of the robot apparatus will select at random one of some things in his or her field of view and utter 

50 the name of the thus-selected thing based on his memory to direct the robot apparatus's attention towards the selected 
thing as a target object to be recognized by the robot apparatus. Actually, however, this Is not any leaming by which 
the robot apparatus can recognize the target object. 

[0053] Also the present invention has another object to overcome the above-mentioned drawbacks ofthe prior art 
by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to share a target 
55 object (attention sharing) in order to appropriately recognize the target object. 

[0054] The conventional robot apparatus detects an external force applied to the head or the other thereof via a 
touch sensor or the like provided at the head, thereby interacting with the user. However, the interaction will be limited 
by the number of sensors provided and location of the latter. 
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Disclosure of the Invention 

[0055] Accordingly, the present invention has an object to overcome the above-mentioned drawbacks of the prior 
art by providing a robot apparatus, extemal force detector and a method for detecting an external force, capable of 
assuring a higher degree of freedom in interaction with a touch (externa! force) by the user. 

[0056] The present invention has another object to provide a robot apparatus and a method for controlling the action 
of the robot apparatus capable of appropriately recognizing each object in its feature space. 
[0057] The present invention has another object to provide a robot apparatus and a method for controlling the action 
of the robot apparatus, capable of generating appropriate action in response to an input. 

[0058] The present invention has another object to provide a robot apparatus and a method for controlling the action 
of the robot apparatus, capable of sharing a target object (attention sharing) to appropriately recognize the target object. 
[0059] The present invention has another object to provide a robot apparatus, external force detector and a method 
for detecting external force, capable of assuring a higher degree of freedom in interaction with a touch (extemal force) 
by the user. 

[0060] The above object can be attained by providing a robot apparatus including: 
means for detecting a touch; 

means for detecting infonnation supplied simultaneously with, just before or after the touch detection by the touch 
detecting means; 

means for storing action made correspondingly to the touch detection in association with the input infonnation 
detected by the input infomfiation detecting means; and 

means for recalling action from infonmation in the storing means based on a newly acquired infonnation to control 
the robot apparatus to do the action. 

[0061] In the above robot apparatus, Information supplied just before or after the touch detection by the touch de- 
tecting means is detected by the input infonnation detecting means, action made in response to the touch and input 
infomriation detected by the input infonnation detecting means are stored in association with each other Into the storing 
means, and action Is recalled by the action controlling means from infonnation In the storing means based on a newly 
acquired input information to control the robot apparatus to do the action. 

[0062] Thus, in the above robot apparatus, input infonnation and action made when the input infonnation has been 
detected are stored in association with each other, and when information identical to the input infonnation is supplied 
again, corresponding action Is reproduced. 

[0063] Also the above object can be attained by providing a method for controlling the action of a robot apparatus, 
including the steps of: 

detecting a touch made to the robot apparatus; 

detecting infonmation supplied simultaneously with or just before or after the touch detection in the touch detecting 
step; 

storing action made in response to the touch detection in the touch detecting step and input infonnation detected 
in the input Infonnation detecting step in association with each other into a storing means; and 
recalling action from the Infonnation in the storing means based on newly acquired Input Infonnation to control the 
robot to do the action. 

[0064] In the above robot apparatus action controlling method, input infonnation and action made when the input 
Infonnation has been detected are stored in association with each other, and when infonnation identical to the input 
infonnation is supplied again, corresponding action is reproduced. 
[0065] Also the above object can be attained by providing a robot apparatus including: 

means for detecting input infonnation; 

means for storing the input infonnation detected by the input information detecting means and action result infor- 
mation indicative of a result of action made correspondingly to the input Infomiation detected by the input infor- 
mation detecting means; and 

means for identifying action result infonnation In the storing means based on a newly supplied input infonnation 
to control the robot apparatus to do action based on the action result information. 

[0066] In the above robot apparatus, action result information indicative of a result of action made con-espondingly 
to the input infonnation detected by the input information detecting means and the input infonnation are stored in 
association with each other Into the storing means, and action result Infonnation in the storing means is Identified based 



6 



EP 1 195 231 A1 



on a newly supplied input infomnation to control the robot apparatus to do action based on the action result information. 
[0067] Thus in the above robot apparatus, input infonnatlon and action result infonnation indicative of action made 
correspondingly to the input infomnatlon are stored in association with each other, and when identical infonnation is 
supplied again, past action is recalled based on the action result infprmation corresponding to the input infomnatlon to 
5 control the robot apparatus to do appropriate action. 

[0068] Also the above object can be attained by providing a method for controlling the action of a robot apparatus, 
including the steps of: 

storing action result infonnation indicative of a result of action made correspondingly to input information detected 
io by an input infonnation detecting means and the input infonnation itself in association with each other into a storing 

means; and 

identifying action result information in the storing means based on newly supplied input infomiation to control the 
robot apparatus to malce action based on the action result infonnation. 

'5 [0069] By the above robot apparatus action controlling method, the robot apparatus stores input infonnation and 
action result infonnation indicative of a result of action made based on the input infomnation in association with each 
other, and when identical input information is supplied again, past action Is recalled based on action result infonnation 
corresponding to the input information to control the robot apparatus to do appropriate action. 
[0070] Also the above object can be attained by providing a robot apparatus including: 

20 

means for detecting input information; 

means for detecting a feature of the input Infonnation detected by the input infonnation detecting means; 
means for classifying the input infonnation based on the detected feature; 
means for controlling the robot apparatus to do action based on the input infonnation; and 
25 means for changing the classification ofthe Input infonnation having caused the robot apparatus to do the action 
based on action result infonnation Indicative of a result of the action made by the robot apparatus under the control 
of the action controlling means. 

[0071] In the above robot apparatus, a feature of input infonnation detected by the input Information detecting means 
30 is detected by the feature detecting means, the input Infonnation is classified based on the detected feature, the robot 
apparatus is controlled by the action controlling means to act based on the classification of the input infonnation, and 
the classification of the input infomnation, having caused the robot apparatus action, is changed based on action result 
infonnation indicative of a result of the action made by the robot apparatus under the control of the action controlling 
means. 

35 [0072] Thus the above robot apparatus acts correspondingly to the classification of Input infonnation and changes 
the classification based on a resuK of the action. 

[0073] Also the above object can be attained by providing a method for controlling the action of a robot apparatus, 
including the steps of: 

40 detecting a feature of Input infonnation detected by an input infonnation detecting means; 

classifying the input infonnation based on the feature detected in the feature detecting step; 

controlling the robot apparatus to act based on the classification of the input information, made in the information 

classifying step; and 

changing the classification of the input infonnation having caused the robot apparatus to do the action based on 
45 action result Infonnation indicative of a result of the action made by the robot apparatus controlled In the action 

controlling step. 

[0074] By the above robot apparatus action controlling method, the robot apparatus is controlled to act correspond- 
ingly to the classification of input information and changes the classification based on a result of the action. 
50 [0075] Also, the above object can be attained by providing a robot apparatus Including: 

means for identifying a target object; 

means for storing Information on the target object identified by the target object Identifying means; and 
means for controlling the robot apparatus to act based on infonnation on a newly detected object and infonnation 
55 on the target object, stored in the storing means. 

[0076] The above robot apparatus stores infonnation on a target object identified by the target object Identifying 
means into the storing means, and Is controlled by the action controlling means to act based on the infonnation on the 
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newly detected object and information on the target object, stored in the storing means. 

[0077] Thus the above robot apparatus stores a target object, and when information on an identical object is supplied 
again, the robot apparatus makes predetemnined action. 

[0078] Also, the above object can be attained by providing a method for controlling the action of a robot apparatus, 
s including the steps of: 

identifying a target object; 

storing information on the target object identified in the target object identifying step into a storing means; and 
controlling the robot apparatus to act based on infomiation on a newly detected object and Infonnatlon on the 
10 target object, stored in the storing means. 

[0079] By the above robot apparatus action controlling means, the robot apparatus stores a target object, and when 
an identical object is supplied again, the robot apparatus makes predetemnined action. Also the above object can be 
attained by providing a robot apparatus including: 

IS 

moving members, 

joints to move the moving members, 

detecting means for detecting the state of the joint to which an external force is applied via the moving member; and 
means for learning the joint state detected by the detecting means and external force in association with each other. 

20 

[0080] In the above robot apparatus, the state of the joint to which an external force is applied via the moving member 
can be detected by the detecting means and the joint state detected by the detecting means and extemal force are 
learned in association with each other by the learning means. That is, the robot apparatus leams an external force in 
association with a joint state which varies correspondingly to the extemal force acting on the moving member. 
25 [0081] Also, the above object can be attained by providing an external force detector including: 

means for detecting the state of a joint which moves a moving member; and 

means for detecting an external force acting on the moving member based on the joint state detected by the joint 
state detecting means. 

30 

[0082] In the above extemal force detector, the state of the joint which moves the moving member Is detected by 
the joint state detecting means and the external force acting on the moving member is detected based on the joint 
state detected by the joint state detecting means. Namely, the external force detector detects an extemal force acting 
on the moving member based on the state of a joint which moves the moving member. 
35 [0083] Also, the above object can be attained by providing a method for detecting an extemal force, including the 
steps of: 

detecting the state of a joint which moves a moving member; 

detecting an external force acting on the moving member based on the detected joint state; and 
40 detecting the external force acting on the moving member based on the state of the joint which moves the moving 

member. 

Brief Description of the Drawings 

45 [0084] 

FIG. 1 show a feature space for detection of a feature of an input signal. 

FIG. 2 is a block diagram of a learning system including recognition classes for image and sound infonnation. 
FIGS. 3A and 3B explain the generation of a new recognition class. 
so FIG. 4 explains the language acquisition with a conceptual structure-based speech input from perceptual infonna- 

tion (as in Document 4 by IwahashI et al). 

FIGS. 5A and 5B explain the relation between an image feature space and sound feature space. 

FIGS. 6A to 6C explain the relation between an image feature space, sound feature space and third feature space. 

FIG. 7 is a perspective view of a robot apparatus according to the present invention. 

FIG. 8 is a block diagram of the circuit configuration of the robot apparatus in FIG. 7. 

FIG. 9 is a block diagram of the software configuration of the robot apparatus in FIG. 7. 

FIG. 10 is a block diagram of a middleware layer in the software configuration in the robot apparatus in FIG. 7. 

FIG. 11 is a block diagram of an application layer in the software configuration in the robot apparatus in FIG, 7. 
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FIG. 12 is a block diagram of action model library in the application layer in FIG. 11 . 

FIG. 13 explains an finite probabilistic automaton being infomnation for determination of action of the robot appa- 
ratus. 

FIG. 14 shows a state transition table prepared in each node in the finite probabilistic automaton. 
5 FIG. 15 is a block diagram of a part, according to the present invention, of the robot apparatus in FIG. 7. 

FIGS. 16A and 16B explain a teaching of a motion to the robot apparatus. 
FIGS. 1 7A and 1 7B show a discrimination unit which teaches a motion to the robot apparatus. 
FIG. 18 is a block diagram of discriminators for learning a motion. 

FIG. 1 9 is a characteristic curve of pulse widths used in the motion learning, showing puise widths which are when 
10 the robot apparatus is in standing position. 

FIG. 20 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 
the robot apparatus in standing position is pushed forward at the back thereof. 

FIG. 21 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 
the robot apparatus in standing position is pushed backward at the bacl< thereof. 
15 FIG. 22 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 

the robot apparatus in standing position is pushed upward at the neck thereof. 

FIG. 23 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 
the robot apparatus in standing position is pushed downward at the neck thereof. 

FIG. 24 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 
20 the robot apparatus in sitting position is pushed upward at the neck thereof. 

FIG. 26 is a characteristic curve of pulse widths used in the motion learning, showing pulse widths which are when 

the robot apparatus in sitting position is pushed downward at the neck thereof. 

FIG. 26 is a block diagram of a pleasantness/unpleasantness judgment unit in the robot device. 

FIG. 27 explains a neurai network. 
25 FIG. 28 is a block diagram of a configuration of the robot apparatus according to the present invention, destined 

for learning an external force. 

FIG. 29 shows a three-layer back-propagation neural network. 

FIG. 30 shows a system of neurons in each layer in the three-layer back-propagation neural network. 

FIG. 31 is a characteristic curve of sigmoid function. 
30 FIG. 32 is a characteristic cun^e of the relation between number of times of learning and mean error. 

FIG. 33 is a detailed block diagram of the speech recognition unit in the robot apparatus. 

FIG. 34 Is a block diagram of an associative memory/recall memory and action generator in the robot apparatus. 

FIG. 35 is a block diagram of the associative memory/recall memory used In the robot apparatus. 

FIG. 36 is a detailed block diagram of a sensor data processor in the robot apparatus. 
35 FIGS. 37A to 37E explains a shared learning for recognition of a target object by pointing the finger. 

FIG. 38 is a schematic block diagram of the associative memory system. 

FIG. 39 shows an example of a competitive learning neural network of a two-layer hierarchical type used in the 
associative memory system in FIG. 38. 

FIG. 40 shows an example of the variation, due to an epoch, of an association between an input neuron activated 
40 by an input pattern and a one not activated, and a neuron in the competitive layer. 

FIG. 41 shows a contents tree of hierarchteal action decision system used for testing the action deciding operation 
of the robot apparatus. 

FIG. 42 shows time changes of Hunger and Sleepy included in the instincts in the first operation test. 
FIG. 43 shows time changes of Activity, Pleasantness and Certainty included in the emotions in the first operation 
45 test. 

FIG. 44 shows time changes of Sleep and Eat as motivations in the first operation test. 

FIG. 45 shows time changes of the instincts in the second operation test. 

FIG. 46 shows time changes of the emotions in the second operation test. 

FIG. 47 shows time changes of a release mechanism (RM) in the second operation test. 

50 

Best Mode for Carrying Out the Invention 

[0085] The best mode for canylng out the present invention will be described in detail with reference to the accom- 
panying drawings. The best mode concerns an autonomous robot apparatus which autonomously behaves corre- 
55 spondingly to Its surrounding environment (external factor) or Internal state (Internal factor). 

[0086] First the construction of the robot apparatus will be described, and then the applications of the present inven- 
tion to the robot apparatus will be described in detail. 
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(1 ) Construction of the robot apparatus according to the present invention 

[0087] As shown In FIG. 7, the robot apparatus (will be referred to simply as "robot" hereunder) Is generally indicated 
with a reference 1. It is a pet robot shaped In the similitude of a "dog". As shown, the robot 1 includes a body unit 2, 
5 leg units 3A to 3D joined to the front right and left and rear right and left, respectively, of the body unit 2, and a head 
unit 4 and tail unit 5 joined to the front and rear ends, respectively, of the body unit 2. 

[0088] As shown in FIG. 8, the body unit 2 houses a CPU (central processing unit) 10, DRAM (dynamic random- 
access memory) 11 , flash ROM (read-only memory) 12, PC (personal computer) card interface circuit 13 and a signal 
processing circuit 14, all connected to each other via an internal bus 15 to fonm a controller 16, and further a battery 

10 17 to supply a power to the robot 1. Further the body unit 2 houses an angular velocity sensor 18 and acceleration 
^ sensor 1 9, to detect the orientation and acceleration of the robot 1 , etc. 

[0089] The head unit 4 houses a CCD (charge coupled device) camera 20 to image the environment surrounding 
the robot 1 , a touch sensor 21 to detect a pressure given to the robot 1 as physical action such as "patting" or "hitting" 
by the user, a distance sensor 22 to measure a distance to an object existing before the robot 1 , a microphone 23 to 

IS collect external sounds, a speaker 24 to output a sound such as barking, LEDs (light emitting diode) (not shown) as 
"eyes" of the robot 1 , and so on, disposed in place, respectively. 

[0090] Further, actuators 25^ to 25^ and potentiometers 26^ to 26^ are disposed in joints of the leg units 3A to 3D, 
articulations between the leg units 3A to 3D and body unit 2, an articulation between the head unit 4 and body unit 2, 
and In an articulation between a tall 5A and tail unit 5, respectively. The numbers of actuators and potentiometers used 
20 in each joint and articulation depend upon the degree of freedom of the actuator and potentiometer. For example, each 
of the actuators 25^ to 25p uses a sen/o motor. As the servo motor Is driven, the leg units 3 A to 3D are controlled to 
shift to a target posture or motion. 

[0091] Each of the angular velocity sensor 18, acceleration sensor 1 9, touch sensor 21 , distance sensor 22, micro- 
phone 23, speaker 24, LEDs, actuators 25^ to 25n, and potentiometers 26^ to 26^ is connected to the signal processing 
25 circuit 14 of the controller 16 via a corresponding one of hubs 27-| to 27^, and CCD camera 20 and battery 17 are 
connected directly to the signal processing circuit 1 4. 

[0092] Note that signals from the angular velocity sensor 1 8, acceleration sensor 1 9 and potentiometers 26^ to 26^ 
are used In learning of the motion (action) as will further be described later. 

[0093] The signal processing circuit 14 sequentially acquires data supplied from each of the above sensors (will be 
30 refen^ed to as "sensor data" hereinunder), image data and speech data, and stores each of them into place in the 

DRAM 11 via the intemal bus 15. Also the signal processing circuit 14 sequentially acquires data supplied from the 

battery 1 7 and indicating the remaining potential in the battery 1 7, and stores each of them Into place in the DRAM 11 . 

[0094] Based on each ofthe sensor data, image data, speech data and remaining battery potential data thus stored 

in the DRAM 11 , the CPU 1 0 will control the action of the robot 1 . 
35 [0095] Actually, after the power is initially supplied to the robot 1 , the CPU 1 0 reads a control program from a memory 

card 28 set in a PC card slot (not shown) in the body unit 2 or flash ROM 12 via the PC card interface circuit 13 or 

directly, and stores It Into the DRAM 11. 

[0096] Also, the CPU 10 detemiines the internal state of the robot 1 , environment surrounding the robot 1 , the ex- 
istence of an instruction or action from the user, etc. based on the sensor data, image data, speech data, remaining 
40 battery potential data sequentially stored from the signal processing circuit 14 into the DRAM 11 as in the above. 

[0097] Further, the CPU 1 0 decides next action based on the detemriination result and the control program stored in 
the DRAM 1 1 , and drives the necessary ones of the actuators 25^ to 25,, for the next action on the basis of the result 
of determination to thereby shake or nod the head unit 4, wag the tail 5A of the tail unit 5 or drive the leg units 3A to 
3D for walking. 

45 [0098] Also at this time, the CPU 1 0 generates speech data as necessary, and supplies it as speech signals to the 
speaker 24 via the signal processing circuit 14, thereby outputting a voice or speech created from the speech signals, 
turning on or off or flickering the LEDs. 

[0099] Thus, the robot 1 autonomously behaves correspondingly to Its intemal state or surrounding environment, or 
an instruction or action from the user. 

so 

(2) Software structure of the control program 

[0100] The above control program for the robot 1 has a software structure as shown in FIG. 9. As shown, a device 
driver layer 30 Is positioned in the lowest layer of the control program, and consists of a device driver set 31 Including 
55 a plurality of device drivers. In this case, each device driver is an object allowed to make a direct access to the CCD 
camera 20 (see FIG. 8) and an ordinary hardware used In a computer such as a timer, and worics with an interruption 
from an appropriate hardware. 

[0101] As shown in FIG. 9, a robotic server object 32 is also positioned In the lowest layer of the device driver layer 
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30. This object 32 consists of, for example, a virtual robot 33 Including a software group which provides an interface 
for access to hardware such as the above-mentioned various sensors, actuators 25^ to 25n, etc., a power manager 
34 including a software group which manages power switching etc., a device driver manager 36 including a software 
group which manages other various device drivers, and a designed robot 36 including a software group which manages 

5 the mechanism of the robot 1 . 

[01 02] There is also provided a manager object 37 consisting of an object manager 38 and service manager 39. The 
object manager 38 is a software group to manage start and termination of each of software groups included in the 
robotic server object 32, middleware layer 40 and application layer 41, respectively. The service manager 39 is a 
software group which manages the association between objects on the basis of Information on an association between 

10 objects stated in an association file stored in the memory card 28 (see FIG. 8). 

[0103] The middleware layer 40 is positioned above the robotic server object 32 and consists of a software group 
which provides basic functions of the robot 1 such as image processing, speech processing, etc. The application layer 
41 is positioned above the middleware layer 40 and consists of a software group which decides action of the robot 1 
based on the result of a process effected by each software group included in the middleware layer 40. 

IS [0104] The software structures of the middleware layer 40 and application layer 41 are shown in detail in FIG. 1 0. 
[0105] As shown In FIG. 1 0, the middleware layer 40 consists of a recognition system 60 including signal processing 
modules 50 to 58 Intended for noise detection, temperature detection, brightness detection, scale detection, distance 
detection, posture detection, touch sensing, motion detection and color recognition, respectively, and an input seman- 
tics converter module 59, and an output system 69 including an output semantics converter module 68 and signal 

20 processing modules 61 to 67 intended for posture management, tracking, motion reproduction, walking, recovery from 
overturn, LED lighting and speech reproduction, respectively. 

[0106] The signal processing modules 50 to 58 in the recognition system 60 acquire appropriate ones of sensor data, 
image data and speech data read from the DRAM 11 (see FIG. 8) by the virtual robot 33 in the robotic server object 
32, process the data in a predetemriined manner and supply the data processing result to the input semantics converter 
25 module 59. In this example, the virtual robot 33 is fomned as a function to transfer or convert signals under a prede- 
tennined communication rule. 

[0107] Based on the data processing result supplied from the signal processing modules 50 to 58, the input semantics 
converter module 59 recognizes the internal state and surrounding environment of the robot 1 such as "noisy", "hot", 
"brlghr, "ball was detected", "overturn was detected", "patted", "hir, "musical scale was heard", "moving object was 
30 detected" or "obstacle was detected", and an instruction or action from the user, and outputs the recognition result to 
the application layer 41 (see FIG. 8). 

[0108] As shown in FIG. 1 1 , the application layer 41 is composed of five modules including action model library 70, 
action switching module 71 , learning module 72, emotion module 73 and an Instinct module 74. 
[0109] As shown in FIG. 12, the action model library 70 has provided therein independent action modules 70^ to 70^ 
35 corresponding to some preselected conditional items, respectively, such as "when battery potential has become low", 
"for recovery from overtum", "for avoiding an obstacle", "for expression of an emotion", "when ball has been detected", 
etc. 

[0110] For example, when these action modules 70^ to 70^ are supplied with the recognition result from the input 
semantics converter module 69 or when a predetemiined time has elapsed after they are supplied with a final recog- 
40 nition result, they decide action for the robot 1 to do next by referring to the parametric value of an appropriate one of 

the emotions held in the emotion module 73 and the parametric value of an appropriate one the desires held in the 
instinct module 74 as necessary as will further be described later, and output the decision result to the action switching 
module 71. 

[0111] Note that in this embodiment of the present invention, the action modules 70^ to 70n adopt, as a means for 
45 decision of next action, an algorithm called "finite probabilistic automaton" to stochastically decide which one of nodes 
(state) NODEoto NODEn shown in FIG. 13 transits and to which other one of them the node transits, based on transition 
probabilities to set for arcs ARC^ to ARC^, respectively, which provide connections between the nodes NODEq 
to NODEn. 

[0112] More specifically, each of the action models 70^ to 70^ has a state transition table 80 as shown in FIG. 14, 
50 corresponding to each of the nodes NODEo to NODEp Included In the action models 70^ to 70n, respectively. 

[0113] That is, in the state transition table 80, input events (recognition results) taken as conditions for transition 
between the nodes NODEq to NODE^ are entered in lines covered by an "Input event name" column In the order of 
precedence, and additional conditional data to the transition conditions are entered In lines covered by "Data name" 
and "Data range" columns. 

55 [0114] Therefore, as shown In the state transition table 80 in FIG. 14, It.is a condition for a node NODE^oo to transit 
to another node that when the result of recognition that "BALL (robot has detected the ball)" is given. "SIZE (ball size)" 
given together with the result of recognition is "0, 1000 (0 to 1000)". Also, the node NODEiqo can transit to another 
node when "OBSTACLE (the robot 1 has detected an obstacle)" is given as a result of the recognition and "DISTANCE 
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(distance between the obstacle and robot 1)" given along with the result of recognition Is "0, 100 (0 to 100)". 
[0115] Also, the node NODE^qo transit to another node when any of the parametric values of those motions 
"JOY", "SURPRISE" and "SADNESS" held in the emotion model 73, of the emotions and desires held in the emotion 
model 73 and instinct model 74. respectively, to which the action models 70^ to 70^, refer, cyclically takes a value of 

5 "50, 1 00 (50 to 100)" even with entry of no result of recognition. 

[0116] In the state transition table 80, names of the nodes to which each of the nodes NODEq to NODE^ can transit 
are given in a "Transition destination nodes" line covered by a "Probability of transition to other node (Di)" column, 
probabilities of the transition to other nodes NODEq to NODEp, which would be possible when all the requirements 
given in the "Input event name", "Data name" and "Date range" columns, respectively, are satisfied are given in cor- 

io responding places in the "Probability of transition to other node (Di)" column, and action to be outputted when the state 
transits to the nodes NODEg to NODEn are given in an "Output action" line covered by the "Probability of transition to 
other node (Di)" column. Note that the sum of the transition probabilities in the line covered by the "Probability of 
transition to other node Di" column is 100 (%). 

[0117] Therefore, the node NODE^qq in the state transition table 80 shown in FIG. 1 4 can transit to a node NODE^20 
IS (node 120) with a transition probability of "30%" when "BALL (the bal has been detected)" is given as a result of the 
recognition and "SIZE (the size of the ball)" given along with the result of recognition is "0, 1000 (0 to 1000)". At this 
time, the robot 1 will make action "ACTION 1". 

[0118] Each of the action models action is configured so that many of the action models 70^ to 70^ given as in the 
state transition table 80 link to each other. For example, when a recognition result is supplied from the input semantics 
20 converter module 59, the state transition table of corresponding nodes NODEq to NODE^ is used to stochastically 
decide next action and the decision result is outputted to the action switching module 71 . 

[0119] The action switching module 71 shown in FIG. 11 selects predetemnined high-priority ones ofthe pieces of 
action outputted from the action models 70^ to 70^ in the action model library 70, and sends a command for execution 
of the action (will be referred to as "action command" hereunder) to the output semantics converter module 68 of the 

25 middleware layer 40. Note that in this embodiment, the action outputted from the lower ones of the action models 70^ 
to 70^ In FIG. 1 2 are set higher In priority than those outputted from the higher action models. 
[0120] When action is complete, the action switching module 71 informs the learning module 72, emotion module 
73 and instinct module 74 of the completion of the action based on action -completion information supplied from the 
output semantics converter module 68, 

30 [0121] On the other hand, the learning module 72 is supplied with the result of recognition of a teaching given by the 
user to the robot as action such as "hit", "patted" or the like, among recognition results supplied from the input semantics 
converter 59. 

[0122] The learning module 71 changes the transition probability for a corresponding one of the action models 70^ 
to 70n in the action model library 70. The learning module 72 reduces the expression probability of that action when 

35 the robot 1 is hit (= scolded) for example, while raising the expression probability when the robot 1 is patted (= praised) 
for example, both on the basis ofthe recognition results and an Information from the action switching module 71. 
[0123] On the other hand, the emotion model 73 holds a parameter indicating the intensity of each of a total of six 
emotions "joy", "sadness", "anger", "surprise", "disgust" and "fear". The emotion model 73 cyclically renews the para- 
metric values of these emotions on the basis of a specific recognition result such as "hif, "patted" or the like supplied 

40 from the input semantics converter module 59, elapsed time and infomiation from the action switching module 71 . 
[0124] More partlculariy, the emotion model 73 uses a predetermined computation equation to compute a variation 
of the emotion at a time from a recognition result supplied from the input semantics converter module 59, behavior of 
the robot 1 at that time, and an elapsed time from the preceding renewal. Then, taking the emotion variation as AE[t], 
current parametric value of the emotion as E[t] and factor indicating the sensitivity to the emotion as k^, the emotion 

45 module 73 detennlnes a pararnetric value E[t+1] of the emotion in a next cycle by computing an equation (4), and 
replaces the emotion parametric value E[t+1] with the cun-ent parametric value E[t] of the emotion, to thereby renew 
the parametric value of the emotion. The emotion model 73 also renews the parametric values of all the remaining 
emotions in the same manner. 

so 

E[t+1] = E[tl + k^x AE[t] (4) 

[0125] Note that it is predetermined how much each of recognition result and infomnation from the output semantics 
converter module 68 influences the variation AE[t] of the parametric value of each emotion. The predetermination is 
ss such that for example, the result of recognition oPhit" will have an great influence on the variation AE[t] of the parametric 
value of the "anger" emotion, while the result of recognition of "patted" will have a great influence on the variation AE 
[t] of the parametric value of the "joy" emotion. 
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[0126] The information from the output semantics converter module 68 is so-called feedback infonnation on action 
(action-completion information). Namely, it is infomnation on the expression result of action. The emotion model 73 will 
change the emotion with such information. For example, "barking" action will lower the level of "anger" emotion. Note 
that information from the output semantics converter module 68 is also supplied to the aforementioned learning module 
5 72 which renews the transition probability for an appropriate one of the action models 70^ to 70^ based on the infor- 
mation from the output semantics converter module 68. 

[0127] Note that the result of action may be fed back by an output {action having a feeling added thereto) of the 
action switching module 71. 

[0128] On the other hand, the instinct model 74 holds parameters Indicating the intensity of each of a total of four 
10 independent desires "exercise", "affection", "appetite" and "curiosity". The Instinct model 74 cyclically renews the par- 
ametric values of these desires on the basis of recognition results supplied from the Input semantics converter module 
59, elapsed time and infonnation from the action switching module 71 . 

[0129] More particularly, the instinct model 74 uses a predetemriined computation equation to compute a variation 
of each of the desires "exercise", "affection" and "curiosity" at a time from a recognition result supplied from the output 

15 semantics converter module 68, elapsed time and Infonnation from the output semantics converter module 68. Then, 
taking the desire variation as Al[k], cun-ent parametric value of the desire as l[k] and factor indicating the sensitivity to 
the desire as k|, the instinct module 74 determines a parametric value l[k+1 ] of the desire In a next cycle by computing 
an equation (5) cyclically, and replaces the value l[k-i-1] with the current parametric value I [k] of the desire, to thereby 
renew the parametric value of the desire. The Instinct model 74 also renews the parametric values of all the remaining 

20 desires except for "appetite" in the same manner. 

I[k+1] = ![k] + k, X Al[k] (5) 

2S [01 30] Note that it is predetermined how much each of recognition result and infonnation from the output semantics 
converter module 68 influences the variation Al[k] of the parametric value of each desire. The predetemnination is such 
that for example, information from the output semantics converter module 68 will have an great influence on the variation 
Al[k] of the parametric value of "tired" state. 

[0131] Note that in this embodiment, each of the emotions and desires (Instinct) has a variable parametric value 
30 range thereof limited to 0 to 1 00, and also each of the factors k^ and k, has a value set for each of emotions and desires. 
[0132] On the other hand, as shown in FIG. 10, the output semantics converter module 68 of the middleware layer 
40 supplies, to an appropriate one of the signal processing modules 61 to 67 of the output system 69, an abstract 
action command like "move forward", "joy", "bark" or "track (a ball)" supplied from the action switching module 71 of 
the application layer 41 as In the above. 
35 [0133] Based on action command supplied, the signal processing modules 61 to 67 generates a servo instruction 
for supply to an appropriate one of the actuators 25^ to 25n, speech data for a speech to be outputted from the speaker 
24 (see FIG. 8) or drive data for supply to the LEDs as "eyes" of the robot for carrying out the action, and sequentially 
sends the data to an appropriate one of the actuators 25^ to 25^ and speaker 24 or LEDs via the virtual robot 33 in the 
robotic server object 32 and signal processing circuit 14 (see FIG. 8) in this order. . 
40 [0134] As in the above, the robot 1 is adapted to autonomously act correspondingly to its own internal state and 
environmental (extemal) state or an instruction or action from the user on the basis of the control program. 

(3) Changing of the Instinct and emotion corresponding to the environment 

[0135] The robot 1 Is also adapted to be cheerful In a bright environment for example, while being quiet in a dart< 

environment. Namely, the robot 1 is adapted to have the emotion and instinct thereof changed correspondingly to the 
extent of each of three factors "noise", "temperature" and "illumination" in the environment surrounding the robot 1 (will 
be referred to as "environmental factors" hereunder), 

[0136] More partlculariy, the robot 1 has extemal sensors to detect the surroundings, including a temperature sensor 
50 or thennosensor (not shown) to detect the anibient temperature In addition to the aforementioned CCD camera 20, 
distance sensor 22, touch sensor 21 and microphone 23, disposed each In place. The recognition system 60 of the 
middleware layer 40 includes the signal processing modules 50 to 52 for noise detection, temperature detection and 
brightness detection corresponding to the above sensors, respectively 

[0137] The noise detecting signal processing module 50 detects the level of ambient noise based on speech data 
55 given from the microphone 23 (see FIG. 8) via the virtual robot 33 In the robotic server object 32. and outputs the 
detection result to the Input semantics converter module 59. 

[0138] The temperature detecting signal processing module 51 detects an ambient temperature based on sensor 
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data supplied from the thermosensor via the virtual robot 33, and outputs the detection result to the input semantics 
converter module 59. 

[0139] The brightness detecting signal processing module 52 detects an ambient Illumination based on image data 
supplied from the CCD camera 20 (see FIG. 8) via the virtual robot 33. and outputs the detection result to the input 
semantics converter module 59. 

[0140] The input semantics converter module 59 recognizes the extent of each of the ambient "noise", "temperature" 
and "illumination" based on the outputs from the signal processing modules 50 to 52, and outputs the recognition result 
to the intemal state model of the application module 41 (see FIG. 11). 

[0141] More specifically, the Input semantics converter module 59 recognizes the extent of ambient "noise" based 
on an output from the noise detecting signal processing module 50, and outputs a recognition result like "noisy" or 
"quier to the emotion model 73, instinct model 74, etc. 

[0142] Also the input semantics converter module 59 recognizes the extent of ambient "temperature" based on an 
output from the temperature detecting signal processing module 51 , and outputs a recognition result like "hof or "cold" 
to the emotion model 73, instinct model 74, etc. 

[0143] Further the input semantics converter module 59 recognizes the extent of ambient "illumination" based on an 
output from the brightness detecting signal processing module 52, and outputs a recognition result like "bright" or "dark" 
to the emotion model 73, Instinct model 74, etc. 

[0144] The emotion model 73 cyclically changes the parametric value of each of the emotions by computing the 
equation (4) based on the various recognition results supplied from the input semantics converter module 59 as In the 
above. 

[0145] Then the emotion model 73 Increases or decreases the value of the factor kg In the equation (4) for a prede- 
termined appropriate emotion based on the recognition results as to the "noise", "temperature" and "Illumination" sup- 
plied from the input semantics converter module 59. 

[01 46] More particularty, for example, when a recognition result "noisy" is supplied, the emotion model 73 will increase 
the value of the factor k^ for the "anger emotion" by a predetemnlned number. On the other hand, when the recognition 
result supplied Is "quief , the emotion model 74 will decrease the value of the factor k^ for the "anger" emotion by a 
predetemnlned number. Thereby, the parametric value of the "anger" emotion will be changed under the influence of 
the ambient "noise". 

[0147] Also, when a recognition result "hot" is supplied, the emotion model 73 will decreases the value of the factor 
kg for the "joy" emotion by a predetemnlned number. On the other hand, the recognition result supplied is "cold", the 
emotion model 73 will increase the value of the factor k© for the "sadness" emotion by a predetermined number. Thus, 
the parametric value of the "sadness" emotion will be changed under the Influence of the ambient "temperature". 
[0148] Further, when a recognition result "brighf is supplied, the emotion model 73 will decreases the value of the 
factor k^ for the "joy" emotion by a predetennined number. On the other hand, the recognition result supplied is "dark", 
the emotion model 73 will increase the value of the factor k^ for the "fear emotion by a predetermined number. Thus, 
the parametric value of the "fear emotion will be changed under the Influence of the ambient "illumination". 
[0149] SImilariy, the instinct model 74 cyclically changes the parametric value of each of the desires by computing 
the equation (6) based on various recognition results supplied from the Input semantics converter module 59 as In the 
above. 

[0150] Also, the instinct model 74 increases or decreases the value of the factor k, in the equation (5) for a prede- 
termined appropriate desire based on the recognition results as to the "noise", "temperature" and "illumination" supplied 
from the input semantics converter module 59. 

[0151] Also, for example, when recognition results "noisy" and "brighf are supplied, the instinct model 74 will de- 
crease the value of the factor k| forthe "tired" state by a predetennined number. On the other hand, when the recognition 
results supplied are "quier and "daric", the Instinct model 74 will increase the value of the factor k, for the "tired" state 
by a predetermined number. Further, for example, when a recognition result "hof or "cold", the Instinct model 74 will 
increase the value of the factor k, for the "tiredness" by a predetennined number. 

[0152] Thus, when the robot 1 is in a "noisy" environment for example, the parametric value of the "anger emotion ^ 
Is easy to Increase while that of the "tired" state is easy to decrease, so that the robot 1 will make a generally "irritated" 
action. On the other hand, when the environment sun-ounding the robot 1 is "quiet", the parametric value of the "anger 
emotion is easy to decrease while that of the "tired" state Is easy to increase, so that the robot 1 will act generally "gently". 
[01 53] Also, when the robot 1 is in a "hor environment, the parametric value of the "joy" emotion is easy to decrease 
while that of the "tired" state is easy to increase, so the robot 1 will show generally "lazy" action. On the other hand, 
when the robot 1 is in a "cold" environment, the parametric value of the "sadness" emotion is easy to Increase while 
that of the "tired" state is easy to increase, so the robot 1 will act generally Vlth a complaint of the cold". 
[0154] When the robot 1 Is in the "bright" environment, the parametric value of the "joy" emotion Is easy to increase 
while that of the "tired" state is easy to decrease, so that the robot 1 will show generally "cheerful" action. On the other 
hand, in a "daric" environment, the parametric value of the "joy" emotion Is easy to Increase while that of the "tired" 
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state is easy to Increase, so that the robot 1 will act generally "gently". 

[0155] The robot 1 is constructed as in the above and can have the emotion and instinct thereof changed corre- 
spondingly to the surrounding environment and autonomously act according to the states of its emotion and instinct. 

5 (4) Applications of the present invention 

(4-1) General description 

[0156] The essential parts of the robot 1 to which the present Invention Is applied will be described herebeiow. The 

10 robot 1 is constructed to leam action in congelation with an image signal and speech signal (acoustic signal), and start 
up action with the image signal and speech signal con^eiated with the action after the learning. In the following, mainly 
an example that the robot 1 Is made to learn action in correlation with a speech or voice will be described. However, 
it is of course that the robot 1 is rifiade to learn action In correlation with an Image. More particularly, according to the 
present invention, the robot 1 is constructed as in the following. 

IS [0157] As shown In FIG. 15, the robot 1 includes a speech recognition unit 101, sensor data processor 102, instinct/ 
emotion storage unit 103, associative memory/recall memory 104 and an action generator 105. 
[0158] The speech recognition unit 101 functions as an Input information detector to detect infomnation supplied 
simultaneously with, or just before or after a touch detection by a touch sensor (e.g., touch sensor 21 in FIG. 8) which 
detects a touch on the robot 1 . The associative memory/recall memory 104 stores action made in response to the 

20 touch and input infomnation (speech signal) detected by the speech recognition unit 1 01 in correlation with each other. 
The action generator 1 06 works as an action controilerto provide action stored in the associative memory/recall memory 
104 and which is associated with newly acquired input infomnation (speech signal). Also, the sensor data processor 
102 works to have the robot 1 make action in response to a touch detection by the touch sensor (not shown) for 
example. More specifically, each of these robotic components functions as will be described below. 

25 [0159] The speech recognition unit 101 processes speech signals supplied from outside (microphone 23) to recog- 
nize them as a predetermined language. More particularly, the speech recognition unit 1 01 adopts the HMM to recognize 
an input speech as a phoneme sequences by a plurality of recognition classes based on the HMM. 
[0160] The speech recognition unit 101 is also capable of generating an increased number of classes based on the 
existing classes through learning. For example, when an unrecognizable speech is given for example, the speech 

30 recognition unit 101 divides the existing class to generate new classes as shown in FIG. 6B. More particulariy, the 
speech recognition unit 101 divides an existing class having a certainty factor (belongingness evaluation) for an Input 
speech to generate new classes. For example, a part of a class, having a small feature, is divided to provide new 
classes. Thus, the speech recognition unit 101 can recognize pre-registered languages as well as a new language. 
[01 61 ] The sensor data processor 1 02 generates a signal for a motion (action) taught to the robot based on a change 

35 of sensor data. That is, the sensor data processor 1 02 recognizes input action infomriation. 

[0162] The motion to be taught to the robot may be a preset one for example, and it .may also be a new one set by 
the user Also, any one of motions already set may be selectively generated. 

[0163] Teaching of an preset motion to the robot is such that the preset motion of the robot is triggered by entry of 
a sensor data from the touch sensor for example. For example, the robot is preset to shift from a "standing" position 
40 to a "sitting" position when a predetemnined touch sensor provided at the rear back portion of the robot Is pushed. 
Namely, the robot is taught to actually shift to the "sitting" position when the touch sensor at the rear back portion of 
the robot in the "standing position" is pushed. 

[0164] Note that a sensor for teaching such a motion to the robot may be provided at the end of the head or leg. By 
providing such sensors in artaitrary positions, it is possible to teach a variety of motions to the robot, 
45 [0165] Also, teaching of a newly set motion to the robot can be done using a change of a control signal for a moving 
part (joint) for example. The moving parts include for example the actuators (servo motors) 25^ to 25^ provided at the 
joints of the leg units 3A to 3D, articulations between the leg units 3A to 3D and body unit 2, articulation between the 
heat unit 4 and body unit 2, articulation between the tall unit 5 and tail 5A, etc. - 

[0166] For example when a moving part of the robot 1 is forced by the user to move, a load will take place to the 
50 moving part. The load to the moving part will cause a different signal from a one which will take place during a nonnal 
motion ofthe moving part (with no external load), for example, a servo signal to the moving part. A change in posture 
of the robot 1 , namely, a motion of the robot 1 , can be known from such a signal. Thus, by storing the signal, an motion 
urged by the user can be taught as a new motion to the robot 1 . Teaching of such a new motion will further be described 
later. Also, according to the present invention, the robot 1 is adapted to detect an external force (external load) from 
55 such a signal change and thus learn the external force as will further be described later. 

[0167] Further, the sensor data processor 1 02 can recognize the class of action the robot 1 has to learn. For example, 
the robot 1 can learn input action information by recognizing the class thereof based on a feature thereof in an action 
feature space. 
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[0168] The instinct/emotion storage unit 103 stores information on emotions conrelated with the aforementioned 
speech and action. That is, the instinct/emotion storage unit 103 changes the instinct or emotion with Input sensor 
signal or the like from the instinct model or emotion model as in the above. 

[0169] The associative memory/recall memory 1 04 makes a teaming based on Infomriation from the aforementioned 
speech recognition unit 101. sensor data processor 102 and Instinct/emotion storage unit 103, and thereafter it gen- 
erates action Infomiation corresponding to input speech and image on the basis ofthe learning. The associative mem- 
ory/recall memory 104 employs a conventional principle of associative memory based on which image and speech 
classes are correlated with each other, having previously been described concerning the equations (1) and (2), and 
associatively stores each infomiation. 

[0170] For example, when the sensor data processor 102 detects, from sensor data, a teaching of a motion for 
shifting from the "standing" position to "sitting" position and the speech recognition unit 101 recognizes an uttered 
language "backward" simultaneously with, or just before or after, the detection of the teaching by the sensor data 
processor 102, the associative memory/recall memory 104 will store (leam) the motion for shifting from the "standing" 
to "sitting" position and the uttered language "backward" in association with each other. This is generally the same as 
the teaching of "sitting" to a real dog. 

[0171] Also, the associative memory/recall memory 1 04 can be adapted to leam an input motion by con-elating the 
input motion and an input language with each other (triggering) only when the input motion and language are preset 
In pair. For example, if the uttered language "backward" is given simultaneously with, or just before or after, the teaching 
of the above-mentioned so-called "sitting" motion, the associative memory/recall memory 104 will leam (store) the 
motion in correlation (association) with the uttered language "backward". However, it will not leam the motion if the 
input language is any other than "backward". 

[0172] Also, the associative memory/recall memory 104 can learn a motion by correlating a recognized motion or 
language with the Instinct and emotion outputted from the instinct/emotion storage unit 103. For example, if the asso- 
ciative memory/recall memory 104 feels scared (fear) In learning of a motion at entry of a speech (uttered language), 
it can learn (store) such a speech in correlation with such a "fear" emotion. 

[0173] As In the above, the associative memory/recall memory 104 learn (store) a speech, motion or emotion in 
correlation (association) with each other, and after the learning, it will generate action information correspondingly to 
an input Image, speech, etc. based on the learning result. 

[0174] The action generator 1 05 generates action based on the action infomiation output from the above associative 
memory/recall memory 1 04. For example, when an uttered language "backward" is given to the associative memory/ 
recall memory 104 after learning of the aforementioned teaching to "backward", the action generator 105 will cause a 
shift from the "standing" position to "sitting" position. 

[0175] As In the above, the robot 1 will be able to learn action In association with speech Infomnation and change of 
sensor signal (data) to enable action (motion) as a result of the learning on the basis of an input speech. 
[01 76] A series of operations of the robot 1 from learning of a "sitting" motion until outputting the motion for example 
will be described below. 

[0177] During learning, the robot 1 Is given a touch simultaneously with, or just before or after, giving a speech 
(acoustic signal) as shown in FIG. 16A. The speech signal is for exampte "backward". Supplying the touch signal is 
equal to teaching of a motion for shifting from the "standing" position to "sitting" position to change a sensor signal 
from a moving part related with the motion. Note that a touch sensor or pushbutton (e.g., teaching button for "sitting") 
may be provided at a predetenmlned location as in the above to teach the motion to the robot by operating (pushing) 
the touch sensor or pushbutton. In this case, supplying a touch signal means generation of a signal by operating such 
a touch sensor. 

[0178] With the above teaching operation, the robot 1 will be taught to make a shifting motion from (A-1) to (A-2) In 
FIG. 16A. 

[0179] After thus taught, the robot 1 will shift to the "sitting" position as in (A-2) in FIG. 1 6A, which is taught during 
leaming, as shown in (B-2) in FIG. 16B when given an uttered language (acoustic signal) taught to the robot 1 during 
learning, for example, "backward" as in (B-1) in FIG. 168. 

[0180] The motion to be taught to the robot is not limited to the aforementioned one. That is, simultaneously with, or 
just before or after, giving a speech (utterance), the robot 1 may be pushed fonward on the back, have the neck raised 
upward or pushed down, or raised at the front legs in order to teach such motions to the robot 1 . By association of this 
teaching of a motion with a corresponding uttered language, a motion "prone lying", "standing" or "shaking" for example 
can be taught to the robot 1 . 

[0181] Also, the robot 1 can learn as follows for example. 

[0182] First, the robot 1 will leam to "kick" in leaming a motion. More particularly, the user (trainer) will operate the 
front legs and teach the robot 1 to "kick" a thing. The motion to be learned by the robot 1 may be a preset one or a 
new one. On the other hand, an uttered language "red" teamed through the language recognition and red color rec- 
ognized based on an image are stored in association with each other. 
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[0183] As a result of such a learning, the robot 1 will recognize an uttered language "kick a red thing" and kick a 
"red" thing as generated action. For example, the red object is recognized by segmentation of an input Image and 
identifying red portions of the Image. That is, the robot 1 will recognize a thing consisting of red segments as an object 
to kick. 

5 [01 84] In the above embodiment, speech infomnation is associated with action but the present invention is not limited 
to this association. For example, image infomriatlon may be associated with the action infomnation. In this case, the 
robot 1 is provided with an image recognition unit for recognizing a specific Image from image signais supplied from 
an imaging unit such as the CCD camera 20 for example. 

[0185] Also, the above embodiment has been described concerning the association of an instinct or emotion output- 
10 ted from the instinct/emotion storage unit 1 03 with learned action and uttered language. However, the present invention 
is not limited to this association. For example, an instinct or emotion taking place later may be linked with preset action 
and uttered language. 

[0186] Further, in the robot 1 , an emotion caused by an output (actual action), an input which is a motivation of the 
emotion (e.g., uttered language or image) and the output itself (actual action) can be stored (learned). Thus, in an 
15 actual scene after the learning, the robot 1 can also recall a con^esponding stored emotion from an input language or 
the like to make predetennlned action without providing any output (action) which should intrlnsteaily be provided 
correspondingly to the input language. 

[0187] By storing (learning) an emotion caused when touching (action) a red thing (input), for example, a scared 
feeling (fear) when sensing a high temperature, the robot 1 can recall a fear Just when seeing a red thing (input) later, 
20 thereby expressing the fear as action (making a predetermined action). Namely, the robot 1 can make any other ap- 
propriate action without repeating any past action such as touching a red thing. 

[01 88] In this case, the associative memory/recall memory 1 04 will function to store action result infomiation indicative 
of the result of action made correspondingly to a speech signal detected by the speech recognition unit 1 01 and the 
speech signal in association with each other, and the action generator 1 05 works to control action based on the action 

25 result infomnation identified by the associative memory/recall memory 104 based on a new Input speech signal. 

[0189] Also the robot I can have an influence of another space of an input, emotion and action on the feature space 
of an input signal to has an influence on the classification of the input signal. That Is, as shown in FIGS, 6A, 6B and 
6C, when classes are near to each other in the feature space of image and speech, the image and speech are classified 
refemng to the third feature space (e.g., action feature space). 

30 [0190] More specifically. In case, having made first action in response to a first input object (image) characterized 
by image signals, the robot 1 was given a reward (e.g., it is "patted") while it was punished (e.g, "hit") when it has also 
made the first action asa result of the classification (first and second objects are near to each other in the image feature 
space) in response to a second object very slmilai' to the first object in the image feature space, the robot 1 is adapted 
to make any action other than the first action in response to a second and subsequent entry ofthe second object. 

35 Namely, the result of classifteation (action result in this case) in another feature space is used to has an influence on 
another classification or strategy of classification. . 

[0191] In this case, the speech recognition unit 1 01 has functions as an input infonnation detector, a feature detector 
to detect a feature of speech signals detected by the input infomnation detector, and an infonnation classification unit 
to classify speech signais based on the feature. It should be reminded that the classification of speech signals based 
40 on a feature Is equivalent to classification by HMI^. Note that the function as the feature detector is perfomied by a 
f eatu re extractor 1 22 which will further be described later with reference to FIG. 33, while the function as the information 
classification unit Is performed by an HMM unit 123 shown also in FIG. 33. 

[0192] Further, the speech recognition unit 101 will have a function as a classification changer to change the clas- 
sification of speech signals (recognition classes) having caused action based on action result information indicative of 
45 the result (e.g, reward or punishment) of the action made by the robot 1 under the control of the action generator 1 05. 
Note that the learning by the association may be a one by association of action made by the robot 1 when praised with 
a stimulus (speech. Image, action or the like). 

[0193] All the components ofthe robot 1 , according to the present invention have been described in the foregoing. 
Next, each of the components will further be described. 

50 

(4-2) Learning of arbitrary motion (detail description of the sensor data processor) 

[0194] As having been described In the foregoing, the robot 1 learns a motion (action) which Is a preset motion or 
an artDltrary motion. There will be described herebelow how the robot 1 learns an arbitrary motion, that is. a new motion. 
55 [0195] The joints ofthe robot 1 are controlled by corresponding servo motors as In the above. In the robot 1 , a time 
series of angles is generated based on angle designations (angle designation information) from the CPU 10, and the 
robot 1 shows a motion as a result. 

[0196] Also, the servo controller provides signals Including an actual Joint angle supplied from the potentiometer 
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provided in each joint and a pulse signal supplied to each servo motor. The robot 1 Is taught an arbitrary motion by the 
use of the pulse signal instead of a sensor signal (sensor data) from a touch sensor, used to teach the aforementioned 
preset motion to the robot 1 . 

(0197] To leam such an arbitrary motion, the robot 1 Includes a discrimination unit 111 shown in FIG 17A The 
discnmination unit 1 1 1 corresponds to the sensor data processor 1 02 in FIG. 1 5 is constructed for the robot 1 to learn 
an arbrtrary motion. The discrimination unit 11 1 is dedicated for the robot 1 to leam a motion based on the pulse width 
of a control signal supplied to each joint servo motor. 

[0198] it should be reminded here that the robot 1 is adapted to shift to various postures and thus will not keep any 
constant posture when leaming a motion. Therefore, the robot 1 has to be taught similar motions both in the "standing" 
and "sitting" positions. Namely, the robot 1 has to be taught a motion by the use of a pulse width controlling the motion 
of a moving part (Joint), In association with each posture of the robot 1 . 

[0199] For this reason, the discrimination unit 111 Includes a plurality of discriminators 111^, IHg, ... each for one 
posture, as shown in FIG . 1 7B. For example, the first discriminator 1 11 ^ is provided for leaming a motion in the "sitting- 
position, and the second discriminator III2 is provided for leaming a similar motion in the "standing" position. 
[0200] Based on inf onnation on a current posture of the robot 1 , the discrimination unit 1 1 1 selects a desired one of 
the plurality of discriminators III1, lllg, ... for leaming of a motion in an arbitrary posture of the robot 1 
[0201] Note that posture information indicative of the cun-ent posture can be detected from Information, for example 
gravity information, provided from the potentiometers 26i to 26^, angular velocity sensor 1 8 or acceleration sensor 19.' 
Also, the current-posture information can be acquired based on a command outputted as a control signal for a moving 
part from the action generator 105. 

[0202] Teaching (learning) Is effected by comparison of a pulse width which takes place when no external force is 
applied to the robot 1 and a one which takes place when an extemal force is applied. Namely, the width of a pulse 
signal supplied to each joint (servo motor) which is in the nomial state (with no extemal force applied) takes a pattern 
fixed within certain limits while the width of a pulse signal supplied to the joint while the robot 1 is being applied with 
an extemal force will have a different pattem from that shown by the joint In the normal state. For teaching a motion 
to the robot 1 by applying an extemal force thereto, the above relation (difference between pulse-width pattems) Is 
used to acquire infonnation on the motion. More particularly, motion teaching is effected as will be described below 
[0203] For example, when the robot 1 is recognized based on posture infonnation to be in a standing position a 
pulse width, which takes place when an extemal force is applied to the robot 1 for teaching a motion, Is supplied to the 
first discriminator 1 1 1 ^ to wh ich infonnation assigned to the motion Is also supplied at the same time. For example a 
pulse width used for the motion teaching Is of a signal used for the so-called PID control as shown in the following 
equation (6). More specifically, a PWM -controlled pulse width is used for this purpose. 



ir.O A/ 

(6) 

where e, is an error of a time I (difference between target angle and current angle in potentiometer (actual angle) and 
Pg, Ig and Dg are constants. The pulse width used for the motion leaming is a P value acquired by comDutina the 
equation (6). i r » 

[0204] For example, vectors are used as infomiation on a pulse width which takes place when an extemal force is 
applied to the robot 1 for the purpose of motion teaching and a to-be-taught motion, respewively. A five-dimensional 
vector [Vo, V^, Vj, Vg, is used as Information assigned to the to-be taught motion. With the five elements Vq V. 
moni,^"." ^* °' " '® P"®®'"'® ^° recognize five types of stimuli. The teaching will be detailed in the following' 

[02051 When the robot 1 is pushed backward on the back, there are provided a vector P. acquired from a pulse width 
resulted at that time and intended-motion infonnation O, = [0, 1. 0, 0, 0]. As shown in FIG. 18 for example the dis- 
cnminator 1 1 1 , is supplied with the pulse-width vector (backward) P, and [0 1 0 0 0] 

[0206] Each of the vector elements Vo. V,. Vj. V3 and V4 is learned as a real number (with floating point) between 
0 and 1 . The larger the (learned) stimulus part, the more approximate the vector element is to 1 . For example a vector 
B acquired as real numbers like [0.1 . 0.9, 0.2. 0.1 , 0.3] as a result of the motion leaming with infomiation O, = [0. 1 . 

[0207] Also, when the robot 1 is pushed forward on the back, there are provided a vector P, acquired from a pulse 
width resulted at that time and intended-motion information Og = [0.0.1 .0,0]. When the robot 1 has the neck pushed 
down there are provided a vector P3 acquired from a pulse width resulted at that time and intended-motion information 
O3 = [0. 0, 0 1 . 0]. When the robot 1 the neck pushed up, there are provided a vector P. acquired from a pulse width 
resulted at that time and intended-motion Infomiation O4 = (0. 0. 0. 0. 1]. Also, for example, there are provided a vector 
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Pq acquired from a pulse width which takes place when no external force Is applied and Intended-nnotlon Information 
Oq = [1 , 0, 0, 0, 0]). The vector Pq and information Oq are compared with the above vectors and Information to learn 
the intended motions. 

[0208] Examples of the pulse width are shown in FIGS. 19 to 25 in which the horizontal axis shows positions of the 

5 joint while the vertical axis shows values taken by the so-called PWM pulse. 

[0209] FIG. 1 9 shows the pulse width (value of pulse signal) when the robot 1 is In the standing position: In Figures, 
"FR1 " indicates a position of the first joint (shoulder joint) of the front right leg, "FR2" indicates a position of the second 
joint (knee Joint) of the same leg, and "FR3" Indicates a position of the third Joint (ankle joint) of the same leg. "FL1" 
Indicates a position of the first Joint (shoulder joint) of the front left leg, "FL2" indicates a position of the second joint 

10 (knee joint) of the same leg, and "FL3" indicates a position of the third joint (ankle joint) of the same leg. "HR1 " indicates 
a position of the first joint (shoulder joint) of the rear right leg, "H R2" indicates a position of the second joint (knee joint) 
of the same leg, and "HR3" indicates a position of the third joint (ankle joint) of the same left. "HL1 " indicates a position 
of the first joint (shoulder joint) of the rear left leg, "HL2" indicates a position of the second joint (knee joint) of the same 
leg, and "HL3" indicates a position of the third joint (ankle joint) of the same leg. "Headi *'Head2" and "HeadS" indicate 

15 positions of the neck Joints, respectively. The above Is also true for FIGS. 20 to 25. Thus, a total of 15 pulse widths 
can be acquired when the robot 1 is in a state (posture or motion). That Is, a vector P used for the aforementioned 
learning can be obtained as a one consisting of 15-dimensional elements. 

[0210] When the robot 1 1n the standing position is pushed forward on the back, a pulse having a width as shown in 
FIG. 20 Is produced. When the robot 1 in the standing position is pushed backward on the back, the width of a thus- 

20 produced pulse will be as shown In FIG. 21 . When the robot 1 in the standing position Is pushed up on the head, a 
pulse having a width as shown in FIG. 22 is produced. When the robot 1 in the standing position is pushed down on 
the head, there will be produced a pulse having a width shown in FIG. 23. When the robot 1 in the sitting position is 
held at the right leg, a pulse having a width as shown In FIG. 24 will be produced. When the robot 1 1n the sitting position 
is held at the left leg, there is produced a pulse having a width as shown in FIG. 25. Based on these pulse widths, the 

25 discrimination unit 111 detects the corresponding postures of the robot 1 for learning a motion. 

[0211] Also, the robot 1 Includes a pleasantness/unpleasantness Judgment unit 112 as shown in FIG. 26 to enable 
such a motion learning as a real animal will do. 

[0212] Receiving an output from the sensor data processor 102, the pleasantness/unpleasantness judgment unit 
112 judges the data to be either an emotion value defining the pleasantness or a one defining the unpleasantness, 

30 and outputs corresponding action Information. For example, when an emotion defining the unpleasantness In the emo- 
tion model 73 has a large value, the pleasantness/unpleasantness judgment unit 112 outputs action infomriation which 
will cause action to avoid the unpleasantness. When the robot 1 is pushed backward on the back, the pleasantness/ 
unpleasantness judgment unit 112 will judge the output from the sensor data processor 102 to be an emotion value 
defining the unpleasantness and output action information for shifting to the "sitting" position. Also, when the robot 1 

35 is pushed forward on the back or down on the neck, the pleasantness/unpleasantness judgment unit 112 will judge the 
data from the sensor data processor 102 to be an emotion value defining the unpleasantness and output action Infor- 
mation for shifting to the "prone" position. When the robot 1 in the prone position has the neck raised upward, the 
pleasantness/unpleasantness judgment unit 1 1 2 will judge the data from the sensor data processor 1 02 to be an emo- 
tion value defining the unpleasantness and output action infomiation for shifting to the "sitting" position. When the robot 

40 1 in the sitting position has the neck raised upward, the pleasantness/unpleasantness judgment unit 112 will judge the 
data from the sensor data processor 1 02 to be an emotion value defining the unpleasantness and output action infor- 
mation for shifting to the "standing" position. That is to say, when the robot 1 Is applied with so a large external force 
as to feel unpleasant, the robot 1 will make the above motions. The action generator 105 generates action based on 
the above action Infonnation. 

45 [0213] As In teaching of a posture to a real dog or the like, the robot 1 applied with action or external force will learn 
to shift from a current position where it feels unpleasant with such a handling to any other position. 
[0214] Note that a motion is taught to the robot 1 by repeating the application of an external force or action a plurality 
of times. Also, the teaching or teaching is repeated for other postures (other discriminators). Each of the discriminators 
is constructed for a learning with a hierarchical neural networicfor example. As shown in FIG. 27, a three-layer neural 

50 network, for example, is composed of an input layer, hidden layer and an output layer. In this case, the procedure for 
learning or teaching is as will be outlined below. 

[0215] In the input layer, sensor signal or the like fomried correspondingly to the Input layer is supplied to each neuron. 
In the hidden layer, feature of data transmitted via each neuron of the input layer is extracted. More particularly, each 
neuron in the hidden layer remarks some feature of input data and extracts it for evaluation. In the output layer, features 
55 from the neurons of the hidden layer are combined together to make a final decision. 

[0216] In the above three-layer neural networic, there is established a back-propagation-based learning which can 
be adopted to construct a discriminator for example. Thus, by pushing backward the robot 1 on the back, = [0, 1 . 
0, 0, 0] is supplied to the discriminator and the latter will output a value (real number) approximate to [0, 1.0,0, 0]. 
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oSo^nlTlH^ ^T^' ""'"^ ^ '^^"^ ^" artjitrary motion via the discrimination unit 111. Thus, through learning 
iTr^nLT ^ K 'T^" ^^f* ^^^f' *e robot 1 can make, in response to a predetermine? 

utterance (speech s.gnal) g.ven thereto, a motion it has leamed correspondingly to the predetemiined utteranT 

(4-3) Learning of an extemal force applied to the robot 1 • 

fhrrJL.T/"^ foregoing, the learning of an arbitrary motion has been described, in the learning of arbitrary motion 
motl ii ' r " '° ^'^^ ' ""^"^ ^" '"-^ «PP"«^ thereto and rSLk^the 

teliniSf '"9 leamed the type Of an applied external force, the robot 1 can make a prede- 

ZefrirT .T the robot 1 has leamed is applied. More particularly, when the robot 1 having 

«rm Ji ro?f T"^' '° ''^ '''"""9 '° " ^'"'"9 P°^'''°" ^P"^'' t° the waist thereof with an 

1 ^"^Tf . " '^"^ ^^^^'^l '"P"t and make a sitting motion for 

f , P^^'^^temiined motion. The learning of an extemal force will further be described herebelow 
Spmhlr Z f "'.T." external force, the robot 1 includes a moving member 151 . joint unit 152 to move the moving 

rrTis!' tTT ° ''f '"^^ ^^^"9 ^PP"«<1 thereto via the moving 

part 151. and a leaming unit 160 to learn the state of the joint unit 152. detected by the detector153, and the external 

Si it th" '^^ ^PP"^'^ an extemal force, the type cf the extemal force based on the state of 
to t e bodv u nf . IT. T'"rT"' ""^""^ ""'^ ^-^^ head unit 4 and tail unit 5, joined 

Lul^ r? 1' '° ^ '''^S- 6 and 8. The joint unit 152 includes such 

actuators, and more partrcularly. the motors forming the actuators. 

[0220] Owing to the above construction, the robot 1 can learn an extemal force by the use of a PWM puise suDoiied 

o. /dT,?; '^^ ''"^^ ' • P°^'°"« °' the leg units 3/ to 3D. head unit 4 'and teZ 5 

joined each with a joint to the body unit 2 are the moving members. Each of the leg units 3A to 3D is composed of a 

a d r£ ,1 ""TlTr^''' "''^ ' P'"^^"^ °' J'^'-t. '"'ee joint and ankle joi tno eKtSer 

hl?„i I V° f "'"^ *° "S'^' '«ft right and left, respectK^ely 0 the 

Inf; r J° "*tf f '"^"^"^'^ '° fr"'" the actuators 25, to 25„. The PWM pulse 

signal is supplied to the motors (actuators 25^ to 25^ Joined). " ^ 

[02211 The width of the PWM pulse signal depends upon the state of the joint (motor) 152 to which an extemal force 
a arge angle and actual one of each joint (motor). Thus, when an extemal force Is applied to the robot 1 the error 

ZhoSp^Miurs^r^^^^^^^^^ ' ^^"^^ a larger external^ the 

^Tf-^l^TrlJT ' ^ ^^^^'^l fo'-*^^ by the use of such a PWI»/I pulse 

fS«IJt t °' ''''' P"^" ^'9"^' ^ °f the joint unit 152 on which an extern^ 

""T^ """^'"^ ""'^ ^'""^ °' puise signal is computed as an em,r or 

difference ben^een a target angle and actual one of each joint (motor) as in the above the state of the joint unrt?52 
detected by the detector 1 53. may be said to be an error or dtffer^nce between a targe angle and actual one of ea^h 

IS RG^rdllrtt ""'T T " ^ processing d°uT ; 

snown in i-it>. B and other or by a software or object program. 

[0222] In this embodiment, a PWM pulse signal supplied to the motors (joint) in the leg units 3A to 3D and one 
elml "°Tr Jfl'"!'' ' ' ^ « "^M pulse tgna used in learning n 

111 n J« ^ *° ^P"^'* »° »he robot. As will be seen through comparison the pul!e 

pulse IS generally symmetrical with respect to "0" (X-axis). ■"■"■Br»yivi 
[0223] In the extemal-force leaming, the patterns of PWM puise width (more specifically, vectors) takina olace when 
vanous external forces are applied to the robot 1 , are used as leaming dLta in the ^ 

for^rmX '''"''"'""'''''"'^^ • 

Sal ffJ^teZnlT"?' "^^"^"^ °' « 'ayer-connection type networic is used forthe 

embTHimfnT , ''^^•^-Pr^P^a^tion neural networic is highly adaptive to the pattem recognition, and this 

layer) 162 and an output layer 163 as shown in FIGS. 29 and 30 aen layer tmioaie 

fS»«V? IH'^^-'^H' ''='*-P'°P«9«tion neural networic. when infomiation (pulse width) b,„ is supplied from a 
touch sensorto the input layerieiafterleaming an extemal force. theoutputlayer163 will ouUlnfomia^^^ 
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the type of the external force learned, corresponding to the infomnatlon Dj^ supplied fronn the touch sensor. 

[0226] The Input layer 1 61 , hidden layer 1 62 and output layer 1 63 In the three-layer back-propagation neural network 

are constructed as wilt be described below. 

[0227] The input layer 161 has a plurality of neurons (18 neurons In this embodiment). That Is, the Input layer 161 
5 is supplied with 18 data for the external-force learning. The robot 1 has three types of cun-ent postures "standing", 
"sitting" and "steeping" for example. The widths of PWM pulse signal supplied to each joint (motor in the join) are in 
15 types (12 types (= 3 types x 4) for supply to the four leg units and 3 types for supply to the head unit). Therefore, 
a total of 1 8 types of PWM pulse widths are supplied as data to the Input layer 1 61 . 

[0228] The current postures are used for the cxtemal-force learning forthe state of a joint depends upon the posture 
10 of the robot 1 , namely, since the pulse width depends upon the posture of the robot 1 . 

[0229] The input layer 1 61 is supplied with a pattern being a vector composed of various pulse widths as information 
D|n from the touch sensor. Note that in this embodiment, since the input pulse width takes a value within the range of 
[- 512, 512], it is nomrialized by the following equation (7): 

lmPUt = (P+|P„|n|)/(Pmax+|Pmin|) (7) 

where P Is a measured pulse width, Pj^g^ is a maximum pulse width (512) and is a minimum pulse width (-512). 
Since the input data about the posture takes a value [0, 1] (either 0 or 1), the pulse width has not to be nonmaiized. 
20 [0230] The hidden layer 1 62 has a plurality of neurons (1 7 neurons in this embodiment). The number of the neurons 
is detemnined by the so-called rule of thumb. That is, the number of neurons in the input layer 1 61 and that In the output 
layer 163 are averaged and the result Is smoothed to detemnine the number of neurons. The number of neurons 
"numOfHidden" In the hidden layer 162, detenmined by the rule of thumb, is given by the following equation (8): 

25 

numOf Hidden = (numOf Input + numOfOutput)/2 + 2 = 1 4 + a (8) 



where "numOf input" is the number of neurons in the input layer 161 , "numOfOutput: Is the number of neurons in the 
output layer 153 and a is a value which is Increased or decreased by the smoothing. Placing "18" as the number of 

30 neurons "numOf Inpuf in the input layer 1 61 and "1 0" as the number of neurons "numOfOutpur In the output layer 1 53 
In the equation (8) will provide a number "17" of neurons "numOfHidden" in the hidden layer 162. 
[0231] The output layer 163 has a plurality of neurons (10 neurons in this embodiment). With the 10 neurons in the 
output layer 163, the robot 1 can recognize 10 types of external force by learning. Namely, the robot 1 can recognize 
1 0 types of external force including, for example, "ForceFonward" (external force by which the robot 1 is pushed forward 

35 on the back, as in FIG. 20), "ForceBackward" (external force by which the robot 1 Is pushed backward on the back, 
as in FIG. 21), "RIghtHandUp" (external force by which the right hand Is raised, as in FIG. 24), "LeftHandUp" (external 
force by which th left hand Is raised, as in FIG. 25), "BothHandUp" (external force by which both the hands are raised, 
not shown), "HeadUp" (external force by whteh the head is raised, as in FIG. 22), "HeadDown" (external force by which 
the head is pushed down, as in FIG. 23), "HeadRight" (external force by which the head is pushed to right, not shown), 

^0 "HeadLeft" (external force by which the head is pushed to left, not shown) and "Noforce" (no external force applied, 
as shown in FIG. 9). 

[0232] The input layer 161, hidden layer 162 and output layer 163 are constructed as in the above. Various input/ 
output functions may be used in the hidden layer 1 62 and output layer 1 63 but in this embodiment, the so-called sigmoid 
function is used. Different from the so-called threshold function, the sigmoid function has a characteristte that it provides 

45 an output which changes smoothly with respect to an input sum as shown in FIG. 31 . 

[0233] The three-layer back-propagation neural network is used to learn various external forces as in the following. 
[0234] The external-force learning is done by supplying the network (learning unit 160) with a pair of input vector 
data and teaching signal vector data as shown in FIG. 30. Trainer vector data is supplied so that a certain neuron 
outputs "1 " while other neurons output "0". Namely, "1" is supplied for an external force of a type to be recognized while 

50 "0" Is supplied for all extemal forces of other types, 

[0235] The middle or hidden layer 1 62 provides an output y,(1 ) as a result of the input sum by computing the sigmoid 
function "sigmoid ()" given by the following equation (9), the output layer 1 63 provides an output y|(2) as a result of the 
input sum by computing the sigmoid function "sigmoid ()" given by the following equation (1 0). The following equation 
(11) is used to renew the weight, that is, to learn the weight. The sigmoid function "sigmoid ()" Is a function given by 

*5 the following equation (12). 
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f mtaOJlnpttl 

y/^^ = stgmoh 



^1^0 / ^pj 



yr-sf^o^i I Wj (j,^ 

w/-^»(t)=w/-^t-1)-8y<"\t)Z,<-^^pw/-^>(t-1)(rn=0.1), (ii) 

sigmoid{x) = 1/(1 +exp(-x)) (1 2) 

where a, is the width of each Input pulse signal, z, is an error back-propagation output, e is a learning function and B 
IS a moment coefficient. The leaming function (e) and moment coefficient are factors greatly affecting the learning 
speed. For example, in the robot 1 constructed as in this embodiment, the leaming speed can be made optimum with 

e = 0.2 and p = 0.4. 

[0236] Entry of input vector data and that of trainer vector data are repeated a plurality of times until the difference 
or enx)r between the input vector data and trainer vector data supplied to the neural network is less than a threshold 
Then, the leaming Is ended. For example, the learning is ended when a mean error as given by the following equation 
(13) IS less than a threshold: * » i 

enror = (z:ite-al^)/numOfOutput (13) 
where a is input vector data and te is trainer vector data. 

[0237] For example, the robot 1 is made to learn the same data repeatedly 1 0 times in an on-line leaming (consecutive 
learning). Also, the robot 1 is made to learn data of the same pattern continuously about 20 times. Thereby the robot 
1 will have ieamed a total of about 800 samples. 

[0238] FIG. 32 shows an example of the relation between a number of times of leaming and a mean error As seen 
from FIG. 32. the mean error when learning has been made about 50 times is minimum, which means that the ieaminq 
has been ended with 50 times of try. Note that an initial value of weighting factor is normally given at random. It depends 
upon the initial value how many times a leaming should be repeated. That is. depending upon the Initial value the 
leaming will be ended with about 50 times of try in one case as In the above, but with about 150 times of try in the 
Other case. ' 

[W39] The three-layer back-propagation neural network is used to leam an external force as in the above Thus the 
robot 1 can receive a plurality of types of input extemal forces (an extemal force supplied a plurality of times) and learn 
he plurality of input external forces in association with a state of the joint unit 152 (e.g. PWM pulse width) to categorize 
Classify) the extemal force. Note that it is conflmied by the so-called versatility test by the robot 1 whether an e^mai 
force could have successfully been categorized or not, for example. 

[0240] More particularly, for example, when the robot 1 is applied at the waist thereof with an extemal force it rec- 
ognizes, through the above-mentioned learning, that the external force applied to the waist is included in the plurality 
of types of extemal forces it has ever learned, more specifically, that the pulse width (pattem) caused by the external 
force applied to the waist corresponds to any of PWM pulse widths (pattem) supplied to each joint unit 152. to thereby 
sit as a con-esponding sitting motion. Thus, the freedom of an instruction by touching (extemal force) by the user can 
be enlarged for the robot 1 to make many types of motions. 

10241] Notethatintheabove.theleamlngusingthethree-layerback-propagationneuralnetworkhasbeendescribed 
but the leaming unit can of course use any other method of leaming. For example, the leaming unit can use SVM 
(support vector machine) to categorize an extemal force applied to the robot 1. The SVM is a method tor linearly 
classifying extemal forces as in the "Perceptron". More specifteally. the SVM is to map data once in a nonlinear space 
and detenmine a hyperpiane separate in the space, and thus can solve an actually nonlinear problem 

[02«1 Noirnaily. in a pattem recognition, when a test sample x = (x^. Xg. X3 xj, a recognition function f(x) given 

by the equation (14) can be detennined: v la «■ 
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' >=i (14) 

[0243] On the assumption that the supervise label Is y = (yi, y2, ya yn). a problem to minimize ||v||2 should be 

solved under the constraints as given by the following expression (1 5): 

Constraints : y|(v"^x,+b) ^ 1 (15) 

[0244] Such a constraint problem can be solved with the Lagrange's method of undetenmined multipliers. By Intro- 
ducing a Lagrange's multiplier, the problem can be expressed as given by the following equation (16): 

X(>vAa)=^l|v||^-I^,(^,((x/v+^i)-l)) ^jgj 

20 [0245] By partial differentiation of b and v as in the following equation (1 7), a quadratic programming problem as 
given by the following equation (18) can be solved. The constraints is given by the following expression (19). 
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dUBb = BUBv=0 (17) 
maxZa, - 1/2Zajajy,yjX,^Xj ' (18) 

Constraints : a, SO, Za,y, = 0 (19) 

[0246J When the number of dimensions in the feature space is smaller than the number of discipline samples, a 
slack variable ^ ^ 0 Is introduced to change the constraints to that given by the following expression (20): 

Constraints : yi(v^x,+b) > 1 - (20) 

[0247] For the optimization, an objective function given by the expression (21) is optimized: 

1/2||vf + CZ4, (21) 

where C is a coefficient to designate an extent to which the constraints are eased. A value for C has to be determined 
experimentally. The problem concerning the Lagrange's multiplier a is' changed to the following expression (22). The 
constraints is given by the following expression (23). 

maxSa, - 1/2a,a|y|yjX,^Xj (22) 

Constraints : 0 ^ a, ^C, Xa(^^ = 0 (23) 

[0248] With the above operations, however, no nonlinear problem can be solved. So, a kemel function K(x, x') being 
a nonlinear map function is introduced, data is mapped once In a high-dimensional space and linearly separated in 
the space. Thereby, the data can be handled as having been nonlinearly separated In the original dimension. Using a 
map C>, a kemel function can be given by the following equation (24), and a discrimination function is given by the 
following equation (25). 
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K(x,x')=<I>(xro(x') (24) 

f(0(x)) = v^<D(x)+b = 5:a,y(K(x, x,)+b (25) 

[0249] An external function is learned by computing a function given by the following expression (26). and the con- 
straints is given by the following equation (27): 

maxZaj . 1/2Za,ajy,yjK(Xi, Xj) (26) 

Constraints : 0 ^ a, ^ C, 2a,yj = 0 (27) 
[0250] The kernel function can be a Gaussian kernel function given by the following equation (28): 



A:(:r.r')=exp[-l^^j 



(28) 



[0251] Action can be categorized by the SVM based on the aforementioned principle. 

[0252] In the foregoing, the learning of an external force by the robot 1 based a state of a joint unit has been described. 
However, the learning can be done only by detecting an external force acting on the robot 1 based on a state of the 
joint. To this end. the robot 1 includes a detector to detect the state of a joint which moves the moving member and a 
detector to detect an external force acting on the moving member on the basis of the joint state detected by the Joint 
state detector. These detectors may be the detector 153 shown In FIG. 28. 

[0253] The robot 1 constructed as In the above can detect, based on the state of a joint, that an external force has 
been applied thereto. The joint state detector and external force detector may be implemented by a software or an 
object program for example. Therefore, the robot 1 can detect an external force applied thereto without such dedicated 
detectors (hardware). Also, it can be said that the robot 1 can learn an external force without any such special elements. 
[0254] Note that the above-mentioned external force detecting system in the robot 1 Is a so-called external force 
detector. The external force detecting system may of course be applied to other than the robot 1 . 
[0255] In this embodiment, the PWM pulse signals supplied to the motors forming the joints of the leg units 3A to 3D 
and those supplied to the motor forming the joints of the body unit 2 and head unit 4 are used for learning external 
forces. However, the present invention is not limited to these PWM pulse signals but a PWM pulse signal supplied to 
a motor forming any other joint may be used. 

[0256] Also in the above, use of the PWM pulse signal for learning an external force has been described. However 
the present invention is not limited to any PWM pulse signal, but a signal which varies depending upon an extema'i 
force may be used for learning the extemal force. 

(4-4) Recognition of speech signal (detail description of the speech recognition unit) 

[0257] Next, the recognition of speech signals will be described in detail. To recognize speech signals, the robot 1 
includes a speech (acoustic) signal input unit 121. feature extractor 122 and an HMM unit 123 as shown in FIG. 33. 
The feature extractor 122 and HMM unit 123 are Included In the speech recognition unit 1 01 shown in FIG. 15. 
[0258] The acoustic signal Input unit 1 21 is supplied with sounds from around the robot 1 . Namely, it is the afore- 
mentioned microphone 23 for example. An acoustic signal (speech signal) from the acoustic signal input unit 121 is 
outputted to the feature extractor 122. 

[0259] The feature extractor 122 detects a feature of the acoustic speech and outputs it to the downstream HMM 
unit 123, 

[0260] The HMM unit 123 adopts a hidden Markov model which classifies the input acoustic signal based on the 
detected feature. For example, it identifies the acoustic signal on the basis of a plurality of classes. Then, the HMM 
unit 1 23 outputs the result of a recognition made based on each of the classes as a probability of correspondence of 
each of the classes to a word, for example, as a vector value. 
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[0261 ] Owing to the above components, the robot 1 identifies an input speecli from the microphone 23 or the like as 
a phoneme sequences. 

[0262] Then, Information [Sq, S^, S2] of a language recognized by the HMM unit In the speech recognition unit 101 
is supplied along with infomnation [Vq, , V3, V4] on a motion, having been acquired by the sensor data processor 

5 102, to the associative memory/recall memory 104 as shown In FIG. 34. 

[0263] In learning, the associative memory/recall memory 1 04 stores the above infomriation in association with each 
other. After the learning, the associative memory/recall memory 104 will output action information based on. the input 
Infomnation, for example, action information [Bq, B^, 82, B3] In the fonn of a vector value for example. 
[0264] As shown in FIG. 35, for example, in case a language "baclcward" as a result of the speech recognition and 

10 vector [0.1 , 0.9, 0.2, 0.1 , 0.3] as a result of the action acquisition have been supplied to the associative memory/recall 
memory 1 04 for the purpose of learning, if an uttered language "backward" Is supplied to the associative memory/recall 
memory 104 after the learning, the associative memory/recall memory 104 will output action infonnation [0, 1 , 0, 0, 0] 
for a "backward" motion. 

[0265] In the foregoing, the learning by the associative recalling by the robot 1 has been described. Next, a learning 
is by a shared attention facilitating to identify a target object will be described. 

(5) Shared attention 

[0266] The robot 1 to learn a speech or image Is designed to identify a specific speech or image based on a back- 
20 ground noise and taken as a target object. The shared attention facilitates to Identify such a target object. For example, 
the shared attention is made possible by generating stimuli by which a trainee (robot 1) can specify a target object, 
such as shaking or swinging the target object (visual stimulus) or uttering (acoustic stimulus). 
[0267] For the shared attention, the robot 1 includes an Image signal input unit 131 , segmentation unit 132 and a 
target object detector 133 as shown in FIG. 36. The segmentation unit 132 and target object detector 133 function to 
2S identify a target object. The aforementioned action generator 1 05 provides action based on the target object infonnation 
thus identified and stored In the associative memory/recall memory 104 and Infomnation on a new object. 
[0268] The image signal input unit 131 Images the sun-oundings of the robot 1 . More specifically, it is the CCD camera 
20 as shown In FiG. 8. An image signal from the image signal input unit 131 is supplied to the segmentation unit 132. 
[0269] The segmentation unit 1 32 make a segmentation of an Image signal, for example, a segmentation according 
30 to colors. The "segmentation" is to identify an area in an image and examine features of the area or map the features 
in a feature space. The segmentation permits to differentiate between a target object and a background in a pick-up 
image. An image signal thus segmented by the segmentation unit 132 Is supplied to a downstream target object detector 
133. 

[0270] The target object detector 133 detects (Identifies) a remarkable portion (target object) from the image infor- 
35 mation segmented as in the above. The detector 1 33 detects, for example, a moving portion, that is, a portion varying 
with the time elapse, as a target object from the segmented image information when the portion fulfils a certain re- 
quirement. A target object is detected as will further be described below. 

[0271] First, a remarkability level Is set for a moving portion (time-varying portion) as a -specific area in a segmented 
image. The "remarkability level" is an Index for identification of a target object. When a target object is identified based 
40 on a motion, the remarkability level will vary with the motion. 

[0272] The specific area is traced to judge, according to its remarkability level, whether it is a target object or not. 
When the remarkability level meets a certain requirement, the robot 1 Is made to Identify the specific area as a target 
object, namely, the robot 1 is caused to "remark" it. 

[0273] For identification of a target object according to its motion, the trainer of a real dog or the like, when having 
45 the dog remark the target object, will shake or swing the target object to attract attention of the dog. For example, when 
teaching a trainee a "glass", the trainer will shake it while uttering "glass" to the trainee. 

[0274] The target object detector 133 traces a specific area, and when its remarkability level Is as predetennined, 
namely, when the motion varies a predetennined amount, the target object detector 133 will Identify the specific area 
as a target object to have the robot 1 pay attention to the specific area. More particularly, when the remarkability level 
so is equal to or exceeds a threshold, the robot 1 will pay attention to the specific area. 

[0275] As in the above, the target object detector 1 33 sets a remarkability level for a specifte area by means ofthe 
segmentation unit 132 to detect (identify) a target object. 

[0276] The image signal input unit 131 Includes the segmentation unit 132 and target object detector 133 to enable 
the robot 1 to make a shared attention. 
55 [0.277] Thus, the robot 1 can appropriately identify a target object to appropriately learn It In association with Image 
infonnation or action as in the above. 

[0278] In the above embodiment, a target object is identified based on a motion of an object in the shared attention 
by the robot 1 . However, the present invention is not limited to this manner of Identification. For example, a target object 
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can be Identified based on a speech. In this case, when a speech takes place, the robot 1 will direct itself towards the 
origin of the speech and identify the speech as a target object. For example, a remarkability level is set for a speech, 
namely, a direction towards the speech and volume of the speech, and the speech is Identified as a target object when 
the direction and volume meet certain requirements. 

[0279] Also, the attention to a target object may attenuate with time lapse. Alternatively. It may be set to attenuate 
when the association with a target object becomes stable. Thereby, it is possible to pay attention to a new stimulus 
(image input and speech input) and trigger (start) a learning. 

[0280] Otherwise, a large value may be so set as a threshold when an object is remarked as to attenuate under a 
certain condition, for example, with time elapse. Also, a remarkability level may be set for two or more objects at the 
same time. The remarkability level is set at to a motion of an object or a speech. For example, a remarkability level of 
a motion may be set for one object while a remarkability level of a speech may be set for the other object. 
[0281] Thus, while a remarked object (specific area) is being examined (for example, features such as color, shape, 
etc. are being examined), a remarkability level can be set for any other object by another stimulus (e.g., speech or 
image). It should be reminded here that since an object cun-ently being remarked has a higher remarkability level as 
mentioned above, an objected previously selected can continuously be examined for a while even when a remarkability 
level is set for another object is set by such another stimulus. 

[0282] When the remarkability of an object being currently remarked has attenuated, attention of the robot 1 may be 
turned to an object having another stimulus, namely, an object whose remarkability has increased. 
[0283] Also, the shared attention can be effected with a motion of a target object as a stimulus as well as with the 
user's or trainer's finger as a stimulus. That is to say, an object pointed by the finger can be identified as a target object. 
[0284] The aforementioned shared attention is as experienced In an ordinary interaction between persons. In this 
case, a skin-color area, for example, detected through the segmentation is taken as a specific area and attention is 
paid to the area, as will be described below with reference to FIG. 37. 

[0285] As shown in FIG. 37A, it is assumed that there has been picked up an Image in which a person points to a 
cone 141 with the hand 142 In an environment. Note that in the processing which will be described below, an object 
may be subjected to an Image processing, for example, by supplying it to a lowpass filter. In view of the speed of 
computation etc. 

[0286] A skin-color portion is extracted from the image. In this case, a color feature space is used to detect a feature 
f the skln-color portion and extract the skin-color portion from the Image. Thus, the hand 142 Is extracted as shown In 
30 FIG.37B. 

[0287] Then, the longitudinal direction of the hand 142 is detemiined as shown in FIG. 37C because the shape of 
the hand pointing to an object is generally rectangular. For example, the longitudinal direction is determined as indicated 
with a line 143 In FIG. 37C. 

[0288] Further, the longitudinal direction thus detenmined is set in the original image as shown in FIG. 37D to Identify 
the object as shown In FIG. 37E. That is, a cone 141 pointed by the finger is Identified. For example, an image of the ' 
ends of fingers is taken out as a sample, the color of the image is identified in the color feature space, to thereby identify 
an area having the color. Thereby it is possible to identify the cone 141 having the same color, for example, yellow. 
[0289] Also, the shared attention is not limited to the aforementioned one, but It may be such that attention is paid 
to an object to which the sightline of the trainer or user Is directed, for example. 

[0290] Also, there may be provided a means for checking whether the robot 1 is making a shared attention. That is, 
when a target object is identified by the shared attention, the robot 1 makes predetermined action. For example, wheri 
the robot 1 has identified (traced) an object shaken for teaching to the robot 1, it is caused to make action such as 
shaking or nodding the head, thereby informing the user or trainer that the robot 1 has Identified the object. Thus, the 
trainer or user can confirm whether the robot 1 has successfully traced or identified the object the trainer has shown 
to the robot 1 for teaching. 

[0291] As in the above, the robot 1 can evaluate Its own action through such an Interaction with the user or trainer 
and leam the most appropriate action for Itself. 

[0292] Also, the robot 1 can store action in association with any other stimulus such as a speech to a sensor to make 
the action only in response to the speech. 

[0293] Next, the associative memory system will be described in detail with reference to FIG. 38. The associative 
memory system shown in FIG. 38 is designed to store and recall four perception channel input patterns (color, shape, 
speech and instinct). As will be seen from FIG. 38, some patterns or prototypes are prepared in advance for'entry to 
each of channels including a color recognition unit 201 . shape recognition unit 202. and a speech recognition unit 203, 
and for example a binary ID (identification infomiatlon) is appended to each of the prototypes. Each of the recognition 
units 201 to 203 recognizes to which one of the prototypes an input pattern con^esponds, and outputs IDs of the Input 
pattems, namely, a color prototype, shape prototype and speech prototype, to a short-term memory 21 1 of an associ- 
ative memory 21 0. A speech prototype ID from the speech recognition unit 203 is passed through a semantics converter 
(SC) 204 to the short-term memory 211 to which a phoneme sequence is also sent to the short-term memory 211 at 
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the same time. The semantics converter (SC) 204 makes a semantical and grammatical tagging. Concerning the In- 
stinct, an internal states model (ISM) unit 205 provides a variation (delta value) of an instinct (e.g., curiosity) as an 
analog value to the short-term memory 211 of the associative memory 210. 

[0294] The associative memory 210 includes the short-term memory 211 , long-term memory 212 and an attention 
5 memory 213. Further, in the associative memory system, there are provided a release mechanism (RIVI) 221 and 
behavior networl< (BeNet) 222 associated with the short-term memory 211. The RM (release mechanism) 221 and 
BeNet (behavior networic) 222 are also called "action generator". 

[0295] In the associative memory system shown in FIG. 38, the color recognition unit 201 appends a color prototype 
ID to each object segmented by a color segmentation module and supplies the data to the associative memory 210. 

10 The speech recognition unit 203 outputs a prototype ID of a word uttered by the user or trainer and sends it along with 
a phoneme sequence of the utterance to the associative memory 210. Thus, the storage and association enables to 
the robot 1 to utter the word. Input information from each channel is stored in the short-term memory 211 in the asso- 
ciative memory 210 and held for a predetenmined time, for example, for a time equivalent to a hundred objects. 
[0296] The associative memory 210 recalls whether an input pattern has been stored therein in the past. If the as- 

15 sociative memory 210 can recall it, it will send the input pattem as it is to the RM 221 and BeNet 222. When the 
associative memory 221 can recall it. it will append a recalling direction to the input pattern and send the data to the 
RM221 and BeNet 222. 

[0297] The BeNet 222 checks a flag (shared attention flag) from a color segmentation module in the color recognition 
. unit 201 , converts, to a latch conimand, whether or not there exists a shared attention made by finger-pointing by the 

20 user as in the above, and sends the latch command to the associative memory 21 0. Supplied with the latch command 
from the BeNet 222, the associative memory 210 makes a frame-number based search for an object matching the 
frame number, and stores the object into the attention memory 213. If the delta value of the instinct Is sufficiently large, 
it is stored from the attentionai memory 213 to the long-term memory 212. The delta value of the instinct can take an 
analog value such as 0 to 100. By storing a delta value of 80, a value "80" can be recalled. 

25 [0298] Next, the associative memory will be described In detail with reference to FIG. 39 showing an example of a 
two-layer hierarchical neural network. The example shown In FIG. 39 is a competitive learning network including an 
Input layer 231 as the first layer and a competitive layer 232 as the second layer and in which the weight of association 
between the i-th unit (neuron) of the input layer 231 and the j-th unit of the competitive layer 232 is Wjj. This competitive 
learning networic works in two modes: memory and recall. In the memory mode, an input pattern is competitively stored, 

30 and in the recall mode, a pattern stored in the past Is recalled from a partial input pattem. There are provided at the 
Input side a number m of neurons correspondingly to inputs x^, X2, x^ of the color, shape, speech and instinct. For 
example, when 20 neurons are provided for each of color prototype ID, shape prototype ID and speech prototype ID 
and 6 neurons are provided for the instinct type, the neurons count a total of 66 (20 + 20 + 20 + 6) . Each of the neurons 
in the competitive layer 232 depicts one symbol, and the number of neurons in the competitive layer 232 is equal to a 

35 number of neurons or patterns which can be stored. In the competitive learning network, the prototype IDs and Instinct 
type can be combined in 48,000 patterns (= 20 X 20 X 20 x 6). For example some 300 neurons should practteally be 
provided In the competitive learning network. 

[0299] Next, the memory mode will be explained. It is assumed here that the weight Wj, of association between the 
input layer 231 and competitive layer 232 takes a value between 0 and 1 . An initial association weight is determined 

40 at random. The storage (memory) is done by selecting a one of the neurons, having won the competition in the com- 
petitive layer, and increasing the force of association between the selected neuron and Input neuron (association weight 
W|j). When a prototype ID corresponding to a neuron x^ for example (first color prototype ID) is recognized for an input 

pattem vector [x, . Xg x J, the neuron x-| is triggered, and then neurons similarly recognized for the shape and speech 

will be triggered sequentially. The triggered neuron takes a value "+1 " while the neuron not triggered takes a value "-1 ". 

45 [0300] A value of the output (competitive) neuron yj is detenmined for the neuron x^ at the input side by computing 
the following equation (29): 



yj= 1 w,,x, (29) 



S5 [0301] Also, a neuron which will win the competition is determined from the following: 

max{y,) 
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[0302] The association between the winner neuron and input neuron is renewed under the following Kohonen update 
rule: 

3 AW^j = a(x^-Wj,) 

where a: Coefficient of learning 
10 Wp(new) = AWj, + Wj,(old) 

The result is nonnalized by I^Nomri to provide the following equation (30): 
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[0303] The association weight Indicates a so-cailed Intensity of learning, and it Is a "memory". 
[0304] In this competitive learning network, the learning coefficient is a = 0,5. A pattern once stored can be recalled 
nearly without fail when a similar pattern Is subsequently presented to the network. 

[0305] Note that the associative memory should be such that in the process of a consecutive learning, the memory 
of a pattern stored frequently will be stronger while the memory of a one not stored so frequently will be wealc. Such 
an associative memory is applicable In this embodiment. That is, the memory can be thus adjusted by tuning the 
learning coefficient and associative memory. For example, when a small coefficient of learning is set for a pattern, an 
epoch for a correspondingly stronger memory Is required. Also, it is possible to lower the leaming coefficient corre- 
spondingly to an epoch, for example, to lower a coefficient of leaming Initially set large for a pattern as the epoch is 
larger. Thus, the memory of a pattern presented not so frequently is not renewed many times, with the result that the 
memory will become vague, a pattern other than a stored one wilt be recalled or a recalling threshold cannot be reached 
for recalling. However, since it is possible to acquire a new symbol or pattern accordingly, a flexible associative memory 
system can be implemented even if its capacity is limited. 
[0306] Next, the recall mode will be explained. 

[0307] It is assumed here that a certain input pattern vector [x^. Xg, .... xj is presented to the associative memory 
system. The Input vector may be either a prototype ID or a likelihood or probability of the prototype ID. The value of 
output (competitive) neuron yj Is determined by computing the aforementioned equation (29) conceming the neuron 
x^ at the Input side. Depending upon the likelihood of each channel, the triggering value of the competitive neuron also 
depicts a likelihood. It is important that likelihood Inputs from a plurality of channels can be connected together to 
determine a general likelihood. In this embodiment, only one pattern is recalled and a winner neuron is determined by 
computing the following: 

45 max{yj} 

The number for a neuron thus determined corresponds to a number for a symbol, and an input pattern is recalled by 
computation of an inverse matrix. That is: 



so 



Y = W.X 
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[0308] Next, the number of input pattem presentations and coupling coefficient will be described. 

[0309] In this embodiment, the coefficient of leaming is set high and turned to store a presented pattem at a stroke. 

The relation between tiie number of attempts of leaming and coupling coeff fcient at this time is examined. The coefficient 
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of coupling between an input pattern and a symbolic neuron in the competitive layer can also be determined by com- 
puting the equation (29). . 

[0310] FIG. 40 shows a relation (active input) between a neuron activated by an input pattern and a neuron in the 
competitive layer, having acquired a symbol, and a relation (non-active input) between a neuron not activated and a 

5 neuron in an associative layer. In FIG. 40, the horizontal axis indicates an epoch while the vertical axis indicates an 
activation. As seen from FIG. 40, in case of an active input, the larger the epoch, the stronger the association between 
an input pattern and symbolic neuron becomes. It is because the memory is largely renewed at the first epoch that the 
association is suddenly enhanced at the second epoch. The curve of the active input will be made gentle by setting a 
lower learning coefficient. In case of a non-active input, however, the association with a neuron not activated by a new 

10 pattem Is weaker. 

[0311] Note that an associative memory system may be built with consideration given to the epoch as well as to a 

frequency of presentations for a pattern presented frequently should preferably be stored on the priority basis because 

the storage capacity is fixed (limited). In connection with this point, it is preferable to introduce a forgetting function. 

For example, a pattem having been stored by mistake due to a nonlinear factor such as a noise from the recognition 
IS unit has only to be presented once but may not be stored, and It is more preferable to store a newly presented important 

pattem while forgetting a pattern of which the epoch is small and the frequency of presentation is low. 

[0312] It should be reminded that in this embodiment, the coefficient of learning is fixed and it is made based on a 

threshold to check if an input pattern is a new one or not, but the learning coefficient may be varied and it may be 

formulated to determine a threshold. 
20 [0313] Next, the response to many input patterns will be described. 

[0314] The results of tests made on the operations of the associative memory system when various patterns are 

supplied thereto are shown in Table 1 . 



Table 1 



Color 


Shape 


Speech 


Instinct 
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Memory 
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Memory 
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Recall (only color is known) 
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Recall (thing first stored) 
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Recall (memory is enhanced) 
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Recall (unknown pattem Is supplied) 
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(2) 
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(1) 


Recall 


OK 
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7 


0 


2 


Memory 


OK 



55 [0315] In Table 1 , a prototype ID of each of color, shape, speech and instinct is indicated with a number like 1 , 2,... 
while a recalled prototype ID Is Indicated with a parenthesized number like (1), (2) 

[0316] As will be seen from Table 2, when a color "1" and shape "3" are supplied by the fifth presentation after an 
input pattern [1, 1, 1, 1] is initially stored, a pattern [1.3, (1). (1)] is recalled based on the color 1 alone. However, by 



29 



EP 1 195 231 A1 



the seventh presentation after a patten [1 , 3, 3. 1 ] is stored by the sixth presentation, a pattern [1 . 3, (3), (1 )] is recalled 
In response to entry of a color "1" and shape "3". 

[0317] When the storage capacity is 20 symbols. 20 types of input patterns as shown in Table 2 are normally be 
stored, but when more than 20 kinds of input pattems (400 patterns In total) as shown In Table 3 are presented, symbols 
early stored lilce [1 , 1 . 1 , 1 J will be overwritten while patterns learned later are will held as a memory Table 2 
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Table 3 
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20 
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[03181 As shown in Table 3, only 20 symbols learned before a last learned one can be acquired (held). 
[031 91 On the assumption that whether a symbol Is a new one is judged based on a condition ttiat "an input pattern 
in which two or more neurons are different In activation", a plurality of things different in any of color and shape for 
example from each other cannot be.named identically to each other. However, in case the things are different in both 
color and shape from each other, they can be named identically to each other. That is to say, patterns [1 , 1 , 1 , 1] and 
[2, 1 , 1 , 1] cannot be stored at the same time, but pattems [1 . 1 . 1 , 1] and [2, 2. 1 . 1] can be stored together! In this 
case, all input pattems as shown in Table 4 can be stored. 



Table 4 
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20 
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[0320] In the associative memory system as having been described in the foregoing, since the storage capacity is 
limited. It should be utilized efficiently. To this end, pattems frequently presented or used should preferably be stored 
the priority basis. 
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[0321] Also it is preferable that In view of the storage capacity, It should made possible to forget a pattern having not 
to be stored while storing a new input pattern which Is important. To this end, the following coupling-coefficient forgetting 
function f should be used: 

^ W^, = f(W,«) 

where W^^ is a new coupling coefficient and Wqi^ is an old coupling. The most simple forgetting function Is to reduce 
the coefficient of coupling with the loser neuron In the competitive layer each tune a pattern Is presented. For example, 
10 the new coupling coefficient may be determined as follows by the use of an old coupling coefficient Wgi^ and 
forgetting coupling coefficient Wf^^et- 

15 

Thus, the coupling with a pattern not presented Is weakened and.it is possible to forget an unimportant pattern not 
frequently used. For a humanoid robot, it is natural and preferable to set a forgetting function f based on the findings 
in the field of the human cerebrophyslology 

[0322] This embodiment has been described concerning the storage of a word (noun). However, it Is preferable to 
20 consider the storage of meaning and episode and acquisition of a verb. For example, a word "kick" is acquired through 

acquisition of "kicking" action. 

[0323] To judge whether an input pattern is a new one, a threshold is set for the activation of a winner neuron in the 
competitive layer. However, since it is necessary to re-tune the activation as the number of input channels increases, 
it is preferable to automatically compute an activation in the course of a program being executed for example. 
25 [0324] Furthemnore, in case the number of input channels has increased up to a multi-modality, it Is preferable to 
consider the nomnalizatlon of each channel. 

[0325] Next, decision of behavior of the robot according to the present invention will be described concerning the 
ISM (internal states model) unit 205 in FIG. 38. That is, In a robot in which an ethologlcal approach Is applied to decide 
behavior, an operation test for examining the action creation is effected based on an external cause factor and internal 
30 cause factor as will be described In detail herebelow. 

[0326] In this embodiment, there are used eight gauges for the internal states and eight instincts corresponding to 
the gauges, respectively. The eight gauges include Nourishment, l\/lovement, IVIoisture, Urine, Tiredness, Affection, 
Curiosity and Sleepy, and the eight instincts include Hunger, Defecation, Thirst, Urination, Exercise, Affection, Curiosity 
and Sleepy. 

35 [0327] The Intemal state varies with a time elapse informed by the biorhythm for example or with a sensor Input and 
success/failure in . The range of the variation is 0 to 1 00, and the degree of the variation depends on a coeffteient in 
personality_gauge.cfg and personality^perception.cfg for example. 

[0328] Also, a Frustration is created when no can be made even with a desire having reached the maximum value, 
and it is cleared when the gauge is varied by as expected. 

40 [0329] In this operation test, a contents tree formed from a hierarchk:al structure (tree structure) of plural pieces of 
as shown In FIG. 41 Is used as action selection/decision system adopting the ethological approach. The contents tree 
includes, from top to bottom, a system, subsystems, modes and modules. The plural pieces of action In the higher 
layer are abstract ones such as a desire, while those in the lower layer are concrete ones to accomplish such a desire. 
The tree shown in FIG. 41 is designed for minimum action based on an ethological model, switching to a tree using a 

45 speech recognition and an operation test, and a test for the learning. The operation test is made with the instincts 
supported by the tree in FIG. 41 , that is, including Hunger, Affection, Curiosity, and Sleepy, and corresponding gauges 
including Nourishment, Affection, Curiosity and Sleepy. Note that in an actual operation test, a code is used to indicate 
a criteria for success or failure in execution of a module and a linear correspondence is used between the gauge and 
Instinct but the present invention Is not limited to these means. 

50 [0330] In this embodiment, an emotion is expressed along a plurality of axes. More specifically. Activation and Pleas- 
antness are used as the axes, and in addition Certainty is used. Namely, an emotion is expressed along these three 
axes, namely, three-dimensionally. The Activation is an extent to which a creature Is activated or sleeping and which 
depends upon the biorhythm found mainly in the creatures, the Pleasantness is an extent indicating how much an 
instinct is fulfilled or not fulfilled, and the Certainty Is an extent indicating with how definite a thing to which the robot 

55 is cun*ently paying attention is. To detenmine the Pleasantness, the aforementioned eight gauges and eight instincts 
are used (however, up to four gauges and four instincts are used in the operation test). Each of the Activation, Pleas- 
antness and Certainty takes a value falling within a range of - 100 to 100, and the Pleasantness and Certainty vary 
with time elapse so as to always take a value "0". Also, the Activation included in the instincts takes a value "0" and 
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the biorhythm takes an initial value as it is. 

[0331] A fulfilment of an instinct Is reflected by the Pleasantness. The Certainty with a vision object is used as it Is 
When there is a thing to which the robot is paying attention. The Activation basically depends upon a value of the 
biorhythm. but when the Sleep has varied, the Activation is varied with the result of the Sleep variation 
[0332] in this embodiment, the above operation is restricted such that the biorhythm is reflected only by the Activation 
and the Certainty Is varied within a range of 0 to 100 in this case. Of course, however, the present invention is not 
limited to these means. 

f°^^o l*^"^" ^ ""^^ ^^^f"® °' operation test will be described conceming results of Sleep and Eat with reference 
to FIGS. 42 to 44. In this first example, changes of Search/Eat and Sleeping by a module In the contents tree in FIG 
41 are examined with the instincts other than the Hunger and Sleepy being fixed. FIG. 42 shows time changes of the 
Hungerand Sleepy included in the instincts, FIG. 43 shows time<hanges of the Activation, Pleasantness and Certainty 
included in the emotions, and FIG. 44 shows time changes of the Sleep and Eat as motivations 
[0334] As will be seen from FIGS. 42 to 44, patting pemnits to shift the robot to a Sleep tree, and hitting pemilts to 
et the robot go out of the Sleep tree (not shown). When the Hunger increases, the robot can go to the Eat tree When 
the Hunger is appeased, the robot can shiftto the Sleep tee. The reason why the Activation will not be changed even 
when the robot is hit, is that the instinct is not changed since the Sleep Is minimum, namely, it is -1 00. After the Hunger 
becomes maximum, that Is, it becomes 100, the Frustration will have an Increased value (not shown) so that the 
Pleasantness will be Increased somewhat gently. 

[0335] Next, a second example of the operation test will be described in which the four gauges including the Nour- 
ishment, Affection. Curiosity and Sleepy and the corresponding instincts are used. FIGS. 45 to 47 show changes of 
the robot behavior and changes In value of the instincts in the contents tree shown In FIG. 41 . FIG. 46 shows time 
change of the emotion, and FIG. 47 show time changes of the release mechanism 

(0M6J As shown In FIGS. 45 to 47, shiftto the Sleep tree by patting the robot, shiftto the Eat tree due to the Hunger 
and shift to infomiation acquisition due to the Curiosity are done effectively When no action is made even with the 
Cunosity included in the instincts is maximum (1 00), the Pleasantness has been changed ratherto the unpleasantness 
rnoo,^'' "^Jl^" P^«'"g tfie robot, the Pleasantness is Improved and thus Comfort is sought 

[0337] The results of the operation tests show that the action selection/decision system in which the ethologlcal 
approach based on the contents tree shown in FIG. 41 operates effectively Industrial Applicability 
[0M8] As having been described in the foregoing, in the robot according to the present invention, information supplied 
just before or after a touch is detected by a touch sensor is detected by an information detector, action made corre- 
spondingly to the touch detection by the touch sensor Is stored in association with the Input information detected by 
tlie input infomnation detector Into a storage unit, action Is recalled by action controller from the Infomiation in the 
storage unit based on newly acquired Input Information, the action is made to store the input Information and action 
acquired when the input Infomfiation has been detected in association with each other, and corresponding action is 
made when identical Input infonnatlon is supplied again. 

[0339] The action controlling method adopted in the robot according to the present invention includes the steps of 
de ecting a touch made to the robot; detecting infonfnation supplied just before or after the touch detection in the touch 
detecting step; storing action made correspondingly to the touch detection In the touch detecting step and input infor- 
mation detected in the Input information detecting step In association with each other into a storage unit; and recalling 
action from the infomiation In the storage unit based on newly acquired input information to control the robot to do the 
action. 

[0340] In the robot according to the present invention, input information and action made when the input information 
has tjeen detected are stored in association with each other, and when Information Identical to the input infomiation Is 
supplied again, corresponding action can be made. 

[0341], Also, in the robot according to the present invention, action result information indicative of the result of action 
made correspondingly to Input information detected by an input infomiation detector and the input infomiation itself 
are stored in association with each other into a storage unit, action result infomiation in the storage unit is identified 
by action controller based on new input infomiation, action Is made based on the action result information, the Input 
in omiation and the action result infomiation indicative of the result of the action made correspondingly to the input 
inforniation are stored In association with each other, and when identical Input infomiation supplied again, past action 
can be recalled based on the corresponding action result infomiation to make appropriate action 
[0342] The action controlling method adopted in the robot according to the present invention includes the steps of 
storing action result infomiation indicative of the result of action made correspondingly to input infomiation detected 
by an input infomiation detector and the input infomiation Itself In association with each other Into a storage unit- and 
Identifying action result infomiation in the storage unit based on new input infomiation to control the robot to r^ake 
action based on the action result information. 

[0343] In the robot according to the present Invention, input infomiation and action made correspondingly to the input 
infomiation arestored in association with each other, andwhen infomiation identical to the inputirifomiatt^^^ 
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again, past action can be recailed based on the con'esponding action result information to make appropriate action. 
[0344] Also, In the robot according to the present invention, a feature of input information detected by an input infor- 
mation detector Is detected by a feature detector, the input information is classified by an infonnation classification unit 
based on the feature, the robot is caused by action controller to act based on the classification of the input information, 

5 the classification of the input infomnatlon having caused the robot action Is changed by a classification changer based 
on action result information indicative of the result of the action made by the robot under the control of the action 
controller, and the robot Is made to act correspondingly to the classification of the input information, thereby pemiitting 
to change the classification of the input infomnatlon based on the result of the action of the robot. 
[0345] The action controlling method adopted in the robot according to the present invention Includes the steps of 

10 detecting a feature of input infomnatlon detected by an Input infomnation detector; classifying the Input Infonnation 
based on the feature detected in the feature detecting step; controlling the robot to act based on the classification of 
the input infomnation, made in the infomnation classifying step; and changing the classification of the input infonmation 
having caused the robot action based on action result infomnation indicative of the result of the action made by the 
robot controlled In the action controlling step. 

15 [0346] The robot according to the present invention can act correspondingly to classification of input information and 
change the classification of the Input infonnation based on the result of the robot action. 

[0347] Also, the robot according to the present invention stores infonnation on a target object identified by a target 
object identification unit into a storage unit, acts based on a newly detected object and information on the target object, 
stored in the storage unit to store the target object, and thus acts in a predetermined manner when an identical object 
20 is supplied again. 

[0348] Also, the action controlling method adopted in the robot according to the present invention includes the steps, 
of identifying a target object; storing infomnation on the target object identified In the target object identifying step into 
a storage unit; and controlling the robot to act based on a newly detected object and the infonnation on the target 
object, stored in the storage unit. 
25 [0349] The robot according to the present invention stores a target object, and when an Identification target is supplied 
again, it can act in a predetemnined manner 

[0350] Also, the robot according to the present invention includes moving members, joints to move the moving mem- 
bers, detectors each for detection of the state of the joint to which an external force is applied via the moving member, 
and a teaming unit to learn the joint state detected by the detector and external force in association with each other, 
30 so that the state of the joint to which an external force is applied via the moving member can be detected by the detector 
and the joint state detected by the detector and external force can be learned in association with each other by the 
learning unit. That is, the robot can learn an external force in association with a joint state which varies correspondingly 
to the extemal force acting on a moving member. 

[0351] Also, the external force detector according to the present invention includes a detector to detect the state of 
35 a joint which moves a moving member, and an external force detector to detect an extemal force acting on the moving 
member based on the joint state detected by the joint state detector, so that the state of the joint which moves the 
moving member can be detected by the joint state detector and the external force acting on the moving member can 
be detected based on the joint state detected by the joint state detector. Namely, the external force detector can detect 
an external force acting on a moving member based on the state of a joint which moves the moving member . 
40 [0352] Also, the extemal force detecting method according to the present invention Includes the steps of detecting 
the state of a joint which moves a moving member, detecting an extemal force acting on the moving member based 
on the detected joint state, and detecting the external force acting on the moving member based on the state of the 
joint which moves the moving member. 

[0353] Note that the present invention is not limited to the embodiments having been described in the foregoing and 
45 the examples of the associative memory system and contents tree for the operation test being non limitative and 
illustrative ones can be modified in various manners. The present Invention can be modified variously without departing 
from the scope and spirit of the claims given later. 



50 Claims 

1 . A robot apparatus comprising: 
means for detecting a touch; 

55 means for detecting Infomnation supplied simultaneously with, just before or after the touch detection by the 

touch detecting means; 

means for storing action made correspondingly to the touch detection in association with the input infonnation 
detected by the input Information detecting means; and 
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means for recalling action from infomriation in the storing means based on a newly acquired infonnation to 
control the robot apparatus to do the action. 

2. The robot apparatus according to claim 1 wherein 

the action made correspondingly to the touch detection by the touch detecting means Is a result of a displacement 
of a moving part due to an external loading by the touch; and 

the touch detecting means detects a touch from a change of a control signal to the moving part due to the 
external loading. 

3. The robot apparatus according to claim 1 further comprising means for allowing the robot apparatus to act corre- 
spondingly to the touch detection by the touch detecting means; 

the storing means stores the action made correspondingly to the touch detection by the touch detecting 
means and input information detected by the Input information detecting means in association with each other. 

4. The robot apparatus according to claim 1 . wherein the input infomriation detecting means detects at least either 
image Information or speech information. 

5. A method for controlling the action of a robot apparatus, the method comprising the steps of: 

detecting a touch made to the robot apparatus; 

detecting infomriation supplied simultaneously with or just before or after the touch detection in the touch 
detecting step; 

, storing action made in response to the touch detection In the touch detecting step and input information de- 

tected in the input Information detecting step in association with each other into a storing means; and 
recalling action from the Infomriation in the storing means based on newly acquired input infomriation to control 
the robot to do the action . 

6. A robot apparatus comprising: 

means for detecting Input infonnation; 

means for storing the input Infonnation detected by the input infonnation detecting means and action result 
infonnation indicative of a result of action made correspondingly to the input information detected by the input 
infomriation detecting means; and 

means for identifying action result infomriation in the storing means based on a newly supplied input Information 
to control the robot apparatus to do action based on the action result Information. 

7. The robot apparatus according to claim 6 wherein an emotion is changed correspondingly to an extemal factor 
and/or internal factor and action is made based on the state of the emotion; 

the storing means stores the emotion state resulted from the action made based on the input infonnation as 
the action result Infonnation and the input infonnation in association with each other; and 

the action controlling means recalls a corresponding emotion state from the storing means based on the 
input Information to control the robot apparatus to act based on the emotion state. 

8. The robot apparatus according to claim 6. wherein the input infomriation detecting means detects at least either 
image infomriation or speech infonnation. 

9. A method for controlling the action of a robot apparatus, the method comprising the steps of: 

storing action result information indicative of a result of action made con^espondingly to input information de- 
tected by an input information detecting means and the input information itself in association with each other 
Into a storing means; and 

Identifying action result information In the storing means based on newly supplied input information to control 
the robot apparatus to make action based on the action result infonnation. 

10. A robot apparatus comprising: 

means for detecting Input infomriation; 

means for detecting a feature of the input infomriation detected by the input information detecting means; 
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means for classifying the input information based on the detected feature; 
means for controlling the robot apparatus to do action based on the input infomnation; and 
means for changing the ctassificatlon of the input Infomnation having caused the robot apparatus to do the 
action based on action result information indicative of a result of the action made by the robot apparatus under 
5 the control of the action controlling means. 

11. The robot apparatus according to claim 10, wherein the input infomnation is image Information or speech infomria- 
tion. 

10 12. The robot apparatus according to claim 1 0, wherein the classification changing means changes the classification 
of input information when the action result information indicates that the action result is unpleasant. 

13. A method for controlling the action of a robot apparatus, the method comprising the steps of: 

15 detecting a feature of Input infomnation detected by an input Infomnation detecting means; 

classifying the input infomiation based on the feature detected In the feature detecting step; 
controlling the robot apparatus to act based on the classification of the input Information, made in the infor- 
mation classifying step; and 

changing the classification of the input infomnation having caused the robot apparatus to do the action based 
20 on action result information indicative of a result of the action made by the robot apparatus controlled in the 

action controlling step. 

14. A robot apparatus comprising: 

25 means for Identifying a target object; 

means for storing Infomnation on the target object identified by the target object identifying means; and 
means for controlling the robot apparatus to act based on infomnation on a newly detected object and infor- 
mation on the target object, stored in the storing means. 

30 15. The robot apparatus according to claim 14, wherein the target object identifying means segments input Image 
information to detect a time change of the segmented area and identify an object corresponding to an area whose 
time change has reached a predetemnined value. 

16. The robot apparatus according to claim 14, wherein the target object identifying means Identifies a target object 
35 based in input speech information. 

17. The robot apparatus according to claim 16, wherein the target object identifying means identifies a target object 
from at least either sound volume or direction infomnation of Input speech infomnation. 

40 18. The robot apparatus according to claim 14, wherein the target object Identifying means detects a sightline of a 
trainer teaching the target object to identify the target object from the sightline. 

19. A method for controlling the action of a robot apparatus, the method comprising the steps of: 

45 identifying a target object; 

storing information on the target object identified in the target object identifying step into a storing means; and 
controlling the robot apparatus to act based on infomnation on a newly detected object and Information on the 
target object, stored in the storing means. 

so 20. A robot apparatus comprising: 

moving members, 

Joints to move the moving members, 

detecting means for detecting the state of the joint to which an extemal force is applied via the moving member; 
55 and 

means for learning the joint state detected by the detecting means and external force in association with each 
other. 



35 



EP 1 195 231 A1 

21. The robot apparatus according to claim 20 wherein the detecting means detects an external force acting on the 
joint via the moving member as a state of the joint; and 

the learning means learns the external force detected by the detecting means and external force to the 
moving member in congelation with each other 

22. The robot apparatus according to claim 20 wherein the detecting means detects a difference between a tarqet 
value and measured value of the joint state; and 

the learning means leams the difference between the target and measured values, detected by the detectina 
means, and the external force in con-elation with each other. 

23. The robot apparatus according to claim 22 wherein 

the detecting means detects a change of a control signal to the joint due to the external force; and 

the learning means leams the changed control signal detected by the detecting means and the external force. 

24. The robot apparatus according to claim 20, further comprising action controlling means for allowing the robot 
apparatus to act based on a result of the learning by the learning means and the joint state after the learning, 

25. The robot apparatus according to claim 20. wherein the teaming means teams by a neural network including an 
input layer, hidden layer and an output layer. 

26. An external force detector comprising: 

means for detecting the state of a joint which moves a moving member; and 

means for detecting an external force acting on the moving member based on the joint state detected bv the 
joint state detecting means. 

27. The detector according to claim 26 wherein the detecting means detects a difference between a target value and 
measured value of the joint state; and 

the external force detecting means detects the extemal force based on the difference between the tarqet 
and measured values, detected by the detecting means. 

28. The detector according to claim 27 wherein 

the detecting means detects a change of a control signal to the joint due to the extemal force applted via the movinq 
member; and a 

^ ^'If f If ^^'^^ ^®*®cting means detects the extemal force based on the changed control signal detected 
by the detecting means. ■ 

29. A method for detecting an extemal force, comprising the steps of: 

detecting the state of a joint which moves a moving member; 

detecting an external force acting on the moving member based on the detected joint state* and 

detecting the extemal force acting on the moving member based on the state of the joint which moves the 

moving member. 
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